The genome in the nucleus of a eukaryote contains the instructions for the activity of a cell. These instructions are first transcribed into RNA and then finally translated into proteins using a four-letter alphabet consisting of nucleotides. The microarray is particularly suitable for measuring the transcription levels of different genes in different cells or conditions. The experimental set-up of a microarray experiment is a long and intricate process that involves several steps.
DNA is described as a double helix. It looks like a twisted long ladder. The sides of the ‘ladder’ are formed by a backbone of sugar and phosphate molecules, and the ‘crosspieces’ consist of two nucleotide bases joined weakly in the middle by hydrogen bonds. On either side of the ‘rungs’ lie complementary bases. Every Adenine base (A) is flanked by a Thymine (T) base, whereas every Guanine base (G) has a Cytosine partner (C) on the other side. Therefore, the strands of the helix are each other’s complement. It is this basic chemical fact of complementarity that lies at the basis of each microarray.
Microarrays have many single strands of a gene sequence segment attached to their surface, known as probes. This attachment is sometimes achieved by physically spotting them on the array and sometimes by immobilizing them to the quartz wafer surface via hydroxylation, as in Affymetrix arrays. In the future, undoubtedly, other media will become available. The single strands are waiting for complementary strands to bond hybridize and stick to the surface of the array. RNA delivers DNA’s genetic message to the cytoplasm of a cell where proteins are made. Chemically speaking, RNA is similar to a single strand of DNA.
The purpose of a microarray is to measure for each gene in the genome the amount of message that was broadcast through the RNA in the case compared to the control sample. Roughly speaking, color-labelled RNA is applied to the microarray, and if the RNA finds its complementary sibling on the array, then it naturally binds and sticks to the array. By measuring the amount of color emitted by the array, one can get a sense of how much RNA was produced for each gene (see the below figure).Although some microarray experiments have been performed using RNA directly, most scientists prefer to work with the more stable cDNA molecule, which is the inverse copy of RNA. It is produced by a little inverse copy machine called an enzyme. It acts by copying a T for each A, an A for each T or U, a C for each G and a G for each C. In this way, it creates the inverse image of usual RNA. This latter is a single strand of nucleotides, that is, effectively a string of four letters, A, U, G and C. Naturally, these letters tend to bind to T, A, C and G, respectively, when they are present. This is the principle behind microarray technology. To avoid RNA binding to itself, it is heated up to 65 degrees celsius. In this way, any self-hybridizations of the RNA that has taken place is undone. After 5 min, the tubes are quickly cooled by putting them into ice for 2 minutes. In this way, any rehybridization of the RNA is prevented because the temperature is too low.
In all microarray experiments, it is essential that the biological material should be selected under controlled conditions. Failure to do so might increase spurious variation as a result of some uncontrolled nuisance factor. A total of 100 micrograms of RNA should be obtained for both samples. This quantity is sufficient for a single microarray. Modern techniques allow one to do the experiment even with a smaller quantity of RNA.
In this figure, I obtain a copy of the double stranded cDNA from the original mRNA that was purifued and lysed from cells.The end product of a microarray experiment is an image with gene spots of varying intensity for each of the treatment and the control samples. A smart way of making genes ‘visible’ has been developed by adding a dye to the cDNA so that the amount of cDNA that sticks to the microarray slide could be measured via an optical scanner. In order to build cDNA, the enzyme needs the nucleotide building blocks, A, T, G and C. Rather than adding 25 microliters of plain C nucleotides, the idea is to add 23 microliters of C’s that have a dye molecule attached to them, as well as 2 microliters of plain C’s. Each time the enzyme needs a C to copy a G, it will most likely use one with a dye molecule attached. Therefore, the number of dye molecules present in the cDNA is proportional to the number of G’s in the RNA, which is roughly proportional to the number of transcribed copies of the gene, as well as the length of the transcript.
Two different dyes Cy3 and Cy5 are used to distinguish treatment and control samples. The reverse transcription can finally start when a reverse transcriptase is added. The enzyme is stored at low temperatures because it degrades quickly at higher temperatures. It performs its best RNA copying activity, however, when it is put in a 42 degrees celsius environment. The control and treatment samples are brought up to this temperature immediately prior to adding the enzyme. Then the enzyme immediately starts its job. If enzyme degradation goes faster than normal, it is sometimes advisable to add some additional enzyme after one hour. Otherwise, the enzyme is allowed to do its job for a total of two hours. At this time, enough cDNA is produced to be applied to the microarray.Not all the loose bases that were added into the RNA sample have been reverse transcribed into cDNA. These loose bases could possibly hybridize spuriously with the immobilized DNA on the array, and it is therefore sensible to filter them out. The mixture is passed through a membrane. The long strands of cDNA stick to the membrane, whereas the loose bases pass through. Possibly, also the RNA sticks to the membrane, but since it is not labelled, this is immaterial. By turning the membrane, the other way around, the labelled cDNA is recovered. The cDNA mixture is then dried down in a centrifuge in order to replace the liquid by a hybridization buffer. This hybridization buffer facilitates the kinetics of the actual hybridization, that is, the attachment of the cDNA produced from the sample of interest to the DNA material on the slide. So far, all the steps have been performed for the treatment and control sample side by side. At this point, the two samples are combined in, hopefully, exactly equal quantities. The resulting mixture is then ready for hybridization to a single microarray.
Note that the cDNA, when left at room temperature for a while, may start to fold onto itself if there are complementary strands in the cDNA sequence. This would inhibit hybridization to the array, and therefore steps have to be undertaken to avoid this folding while the microarray is being prepared. By heating the cDNA mixture to 85 degrees celsius for 5 minutes and then shock freezing it by putting it into ice, the self-folding of the cDNA is prevented.The raw CEL files are produced by the array scanner software and contain the measured probe intensities.
Each dataset at ArrayExpress is stored according to the MAGE-TAB (MicroArray Gene Expression Tabular) specifications as a collection of tables bundled with the raw data. The MAGE-TAB format specifies up to five different types of files: Investigation Description Format (IDF) contains top level information about the experiment including title, description, submitter contact details and protocols. Array Design Format (ADF). Sample and Data Relationship Format (SDRF) contains essential information on the experimental samples. The ExpressionSet class is designed to combine several different sources of information into a single convenient structure. An ExpressionSet can be manipulated and is the input to or output of many Bioconductor functions.Before I’ll move on to the actual raw data import, I will briefly introduce the ExpressionSet class contained in the Biobase package. It is commonly used to store microarray data in Bioconductor. The ExpressionSet class is designed to combine several different sources of information into a single convenient structure.
The data in an ExpressionSet consist of:
1)AssayData: Expression data from microarray experiments with microarray probes in rows and sample identifiers in columns.
2)Metadata:
a)PhenoData: A description of the samples in the experiment with sample identifiers in rows and description elements in columns; holds the content of the SDRF file.
b)FeatureData: metadata about the features on the chip or technology used for the experiment with same rows as assayData by default and freely assignable columns.
c)Further annotations for the features
3)ExperimentData: A flexible structure to describe the experiment.
In order to analyze which genes are differentially expressed between SPTB (inclusing sPTD & PPROM) and TERM DELIVERY (Control), I’ll have to fit a linear model to our expression data. Linear models are the workhorse for the analysis of experimental data. They can be used to analyze almost arbitrarily complex designs; however, they also take a while to learn and understand and a thorough description is beyond the scope of this workflow.
Linear models for microarrays: I will now apply linear models to microarrays. Specifically, I’ll discuss how to use the limma package for differential expression analysis. The package is designed to analyze complex experiments involving comparisons between many experimental groups simultaneously while remaining reasonably easy to use for simple experiments. The main idea is to fit a linear model to the expression data for each gene.
Empirical Bayes and other methods are used to borrow information across genes for the residual variance estimation leading to moderated t-statistics and stabilizing the analysis for experiments with just a small number of arrays. In the following, I’ll be using appropriate design and contrast matrices for our linear models and fit a linear model to each gene separately.To analyze microarray data, I need a specific R package, called Bioconductor. However, Bioconductor uses functions and object from various other R packages, so I need to install several R packages. Additionally, I will need an R-package for making graphs of the data, called ggplot2. In order to use the installed R and BioConductor packages in R, I have to load them first.
Bioconductor is object-oriented R. It means that a package consists of classes. The classes define the behaviour and characteristics of a set of similar objects that belong to the class. The characteristics that objects of a class can have are called slots while the behaviour of the objects (the actions they can do) is described by the methods of a class.
library(Biobase) #package that contains functions needed for microarray data analysis.
library(oligo) #A package to analyze oligonucleotide arrays at probe-level. It currently supports Affymetrix (CEL files) and standardized data structures to represent genomic data.
library(limma) #Data analysis, linear models and differential expression for microarray data.
library(gplots) #plotting data.
library(ggplot2) #Create Elegant Data Visualisations Using the Grammar of Graphics.
library(ggcorrplot) #provides a solution for reordering the correlation matrix and displays the significance level on the plot. It also includes a function for computing a matrix of correlation p-values.
library(preprocessCore) #A library of core preprocessing routines.
library(plotly) #data analytics and visualization tools.
library(wesanderson) #A Wes Anderson Palette Generator.
library(dplyr) #package which provides a set of tools for efficiently manipulating datasets.
library(ggpubr) #facilitates the creation of beautiful ggplot2-based graphs for researcher with non-advanced programming backgrounds.
library(knitr) #provides a general-purpose tool for dynamic report generation in R.
PROJECT PART I - ANALYSIS OF MICROARRAY EXPERIMENT
Microarrays can be used in many types of experiments. Gene expression profiling is by far the most common use of microarray technology. The two colour microarrays can be used for this type of experiment. The process of analysing gene expression data involves:The list.files() command should be used to obtain the list of CEL files in the folder that was specified by the celpath. Then I will import all the CEL files by a single command using the read.celfiles() method.
celpath <- "~/Desktop/oliver/HuGene21ST/"
#import CEL files containing raw probe-level data into an R FeatureSet object
list <- list.files(celpath,full.names=TRUE)
data <- read.celfiles(list)
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437801_HTHuGene21_111412H_SL309_515122-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437802_HTHuGene21_111412H_SL310_515122-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437803_HTHuGene21_092712H_SL77_810384-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437804_HTHuGene21_092712H_SL78_810384-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437805_HTHuGene21_111912H_SL313_810392-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437806_HTHuGene21_111912H_SL314_810392-2A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437807_HTHuGene21_101512H_SL181_810401-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437808_HTHuGene21_101512H_SL182_810401-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437809_HTHuGene21_101512H_SL169_810413-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437810_HTHuGene21_101512H_SL170_810413-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437811_HTHuGene21_100412H_SL139_810416-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437812_HTHuGene21_100412H_SL140_810416-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437813_HTHuGene21_091912H_SL57_810421-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437814_HTHuGene21_091912H_SL58_810421-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437815_HTHuGene21_102512H_SL225_810424-1_2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437816_HTHuGene21_102512H_SL226_810424-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437817_HTHuGene21_102912H_SL261_810430-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437818_HTHuGene21_102912H_SL262_810430-2C.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437819_HTHuGene21_101812H_SL211_810432-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437820_HTHuGene21_101812H_SL212_810432-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437821_HTHuGene21_092712H_SL81_810439-1B.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437822_HTHuGene21_092712H_SL82_810439-2A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437823_HTHuGene21_102512H_SL235_810447-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437824_HTHuGene21_102512H_SL236_810447-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437825_HTHuGene21_100212H_SL106_810460-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437826_HTHuGene21_100212H_SL103_810460-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437827_HTHuGene21_102912H_SL253_810462-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437828_HTHuGene21_102912H_SL254_810462-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437829_HTHuGene21_111412H_SL300_810469-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437830_HTHuGene21_111412H_SL297_810469-2_2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437831_HTHuGene21_100912H_SL164_810477-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437832_HTHuGene21_100412H_SL125_810494-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437833_HTHuGene21_100412H_SL126_810494-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437834_HTHuGene21_092712H_SL73_810501-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437835_HTHuGene21_092712H_SL74_810501-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437836_HTHuGene21_092712H_SL87_810507-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437837_HTHuGene21_092712H_SL88_810507-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437838_HTHuGene21_100212H_SL99_810516-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437839_HTHuGene21_100212H_SL100_810516-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437840_HTHuGene21_100412H_SL133_810518-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437841_HTHuGene21_100412H_SL134_810518-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437842_HTHuGene21_101812H_SL202_810521-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437843_HTHuGene21_101812H_SL203_810521-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437844_HTHuGene21_102512H_SL227_810529-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437845_HTHuGene21_102512H_SL228_810529-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437846_HTHuGene21_101512H_SL183_810533-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437847_HTHuGene21_101512H_SL184_810533-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437848_HTHuGene21_101512H_SL173_810545-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437849_HTHuGene21_101512H_SL174_810545-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437850_HTHuGene21_082912H_SL19_810563-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437851_HTHuGene21_082912H_SL20_810563-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437852_HTHuGene21_102512H_SL229_810568-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437853_HTHuGene21_102512H_SL230_810568-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437854_HTHuGene21_100212H_SL109_810619-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437855_HTHuGene21_100212H_SL110_810619-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437856_HTHuGene21_091712H_SL39_810657-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437857_HTHuGene21_091712H_SL40_810657-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437858_HTHuGene21_092712H_SL95_812226-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437859_HTHuGene21_092712H_SL96_812226-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437860_HTHuGene21_082912H_SL8_812228-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437861_HTHuGene21_082912H_SL9_812228-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437862_HTHuGene21_102512H_SL217_812230-1_2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437863_HTHuGene21_102512H_SL218_812230-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437864_HTHuGene21_101512H_SL191_812232-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437865_HTHuGene21_101512H_SL192_812232-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437866_HTHuGene21_101812H_SL213_812234-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437867_HTHuGene21_101812H_SL214_812234-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437868_HTHuGene21_100212H_SL111_812235-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437869_HTHuGene21_100212H_SL112_812235-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437870_HTHuGene21_101812H_SL204_812236-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437871_HTHuGene21_101812H_SL199_812236-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437872_HTHuGene21_111412H_SL311_812249-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437873_HTHuGene21_111412H_SL312_812249-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437874_HTHuGene21_111912H_SL335_812261-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437875_HTHuGene21_111912H_SL336_812261-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437876_HTHuGene21_101512H_SL171_812268-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437877_HTHuGene21_101512H_SL172_812268-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437878_HTHuGene21_082912H_SL18_812282-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437879_HTHuGene21_100212H_SL115_812285-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437880_HTHuGene21_100212H_SL116_812285-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437881_HTHuGene21_082912H_SL10_812292-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437882_HTHuGene21_082912H_SL11_812292-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437883_HTHuGene21_111912H_SL323_812296-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437884_HTHuGene21_111912H_SL324_812296-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437885_HTHuGene21_091912H_SL49_812302-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437886_HTHuGene21_091912H_SL50_812302-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437887_HTHuGene21_110512H_SL268_812309-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437888_HTHuGene21_110512H_SL265_812309-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437889_HTHuGene21_100212H_SL97_812324-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437890_HTHuGene21_100212H_SL98_812324-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437891_HTHuGene21_102912H_SL255_812329-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437892_HTHuGene21_102912H_SL256_812329-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437893_HTHuGene21_092712H_SL75_812342-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437894_HTHuGene21_092712H_SL76_812342-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437895_HTHuGene21_110512H_SL269_812344-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437896_HTHuGene21_110512H_SL270_812344-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437897_HTHuGene21_092712H_SL89_812359-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437898_HTHuGene21_092712H_SL90_812359-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437899_HTHuGene21_092712H_SL83_812366-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437900_HTHuGene21_092712H_SL84_812366-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437901_HTHuGene21_110512H_SL277_812387-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437902_HTHuGene21_110512H_SL278_812387-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437903_HTHuGene21_101812H_SL195_812396-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437904_HTHuGene21_101812H_SL196_812396-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437905_HTHuGene21_110512H_SL283_812407-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437906_HTHuGene21_110512H_SL284_812407-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437907_HTHuGene21_100212H_SL104_812448-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437908_HTHuGene21_100212H_SL105_812448-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437909_HTHuGene21_100412H_SL123_812459-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437910_HTHuGene21_100412H_SL124_812459-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437911_HTHuGene21_102512H_SL219_812477-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437912_HTHuGene21_102512H_SL221_812477-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437913_HTHuGene21_101512H_SL177_812509-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437914_HTHuGene21_101512H_SL178_812509-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437915_HTHuGene21_100412H_SL121_812518-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437916_HTHuGene21_100412H_SL122_812518-2C.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437917_HTHuGene21_102912H_SL241_812546-1_2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437918_HTHuGene21_102912H_SL242_812546-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437919_HTHuGene21_111912H_SL315_812551-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437920_HTHuGene21_111912H_SL316_812551-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437921_HTHuGene21_100912H_SL149_812555-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437922_HTHuGene21_111412H_SL298_812559-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437923_HTHuGene21_111412H_SL299_812559-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437924_HTHuGene21_091912H_SL69_812562-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437925_HTHuGene21_091912H_SL70_812562-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437926_HTHuGene21_100912H_SL150_812566-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437927_HTHuGene21_100912H_SL151_812566-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437928_HTHuGene21_101512H_SL175_812573-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437929_HTHuGene21_101512H_SL176_812573-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437930_HTHuGene21_111412H_SL289_812574-1_2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437931_HTHuGene21_111412H_SL290_812574-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437932_HTHuGene21_100912H_SL147_812581-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437933_HTHuGene21_100912H_SL148_812581-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437934_HTHuGene21_100412H_SL129_812586-1_2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437935_HTHuGene21_100412H_SL130_812586-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437936_HTHuGene21_100912H_SL158_812587-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437937_HTHuGene21_100912H_SL159_812587-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437938_HTHuGene21_091912H_SL59_812590-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437939_HTHuGene21_091912H_SL60_812590-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437940_HTHuGene21_082912H_SL4_815072-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437941_HTHuGene21_082912H_SL5_815072-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437942_HTHuGene21_082912H_SL16_815073-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437943_HTHuGene21_082912H_SL17_815073-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437944_HTHuGene21_091712H_SL27_815076-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437945_HTHuGene21_091712H_SL28_815076-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437946_HTHuGene21_082912H_SL1_815082-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437947_HTHuGene21_110512H_SL273_815094-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437948_HTHuGene21_110512H_SL274_815094-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437949_HTHuGene21_091912H_SL63_815102-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437950_HTHuGene21_091912H_SL64_815102-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437951_HTHuGene21_091712H_SL33_815116-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437952_HTHuGene21_091712H_SL34_815116-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437953_HTHuGene21_100912H_SL156_815123-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437954_HTHuGene21_100912H_SL157_815123-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437955_HTHuGene21_091712H_SL31_815127-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437956_HTHuGene21_091712H_SL32_815127-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437957_HTHuGene21_082912H_SL2_815137-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437958_HTHuGene21_082912H_SL3_815137-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437959_HTHuGene21_091912H_SL55_815149-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437960_HTHuGene21_091912H_SL56_815149-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437961_HTHuGene21_102912H_SL245_815154-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437962_HTHuGene21_102912H_SL246_815154-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437963_HTHuGene21_082912H_SL21_815163-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437964_HTHuGene21_082912H_SL22_815163-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437965_HTHuGene21_110512H_SL285_815168-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437966_HTHuGene21_110512H_SL286_815168-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437967_HTHuGene21_091912H_SL67_815179-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437968_HTHuGene21_091912H_SL68_815179-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437969_HTHuGene21_110512H_SL266_815183-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437970_HTHuGene21_110512H_SL267_815183-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437971_HTHuGene21_102912H_SL247_815189-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437972_HTHuGene21_102912H_SL248_815189-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437973_HTHuGene21_082912H_SL14_815194-1B.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437974_HTHuGene21_082912H_SL15_815194-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437975_HTHuGene21_101512H_SL187_815196-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437976_HTHuGene21_101512H_SL188_815196-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437977_HTHuGene21_092712H_SL79_815200-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437978_HTHuGene21_092712H_SL80_815200-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437979_HTHuGene21_092712H_SL93_815218-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437980_HTHuGene21_092712H_SL94_815218-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437981_HTHuGene21_110512H_SL287_815219-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437982_HTHuGene21_110512H_SL288_815219-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437983_HTHuGene21_100412H_SL137_818022-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437984_HTHuGene21_100412H_SL138_818022-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437985_HTHuGene21_100412H_SL135_818023-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437986_HTHuGene21_100412H_SL136_818023-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437987_HTHuGene21_091912H_SL71_818025-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437988_HTHuGene21_091912H_SL72_818025-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437989_HTHuGene21_091712H_SL35_818032-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437990_HTHuGene21_091712H_SL36_818032-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437991_HTHuGene21_100412H_SL141_818034-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437992_HTHuGene21_100412H_SL142_818034-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437993_HTHuGene21_091712H_SL41_818036-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437994_HTHuGene21_091712H_SL42_818036-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437995_HTHuGene21_111912H_SL319_818046-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437996_HTHuGene21_111912H_SL320_818046-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437997_HTHuGene21_102912H_SL249_818054-1_2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437998_HTHuGene21_102912H_SL250_818054-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437999_HTHuGene21_110512H_SL279_818070-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438000_HTHuGene21_110512H_SL280_818070-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438001_HTHuGene21_101812H_SL205_818081-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438002_HTHuGene21_101812H_SL206_818081-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438003_HTHuGene21_091912H_SL51_818084-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438004_HTHuGene21_091912H_SL52_818084-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438005_HTHuGene21_092712H_SL85_818088-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438006_HTHuGene21_092712H_SL86_818088-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438007_HTHuGene21_111412H_SL301_818125-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438008_HTHuGene21_111412H_SL302_818125-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438009_HTHuGene21_091712H_SL37_818153-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438010_HTHuGene21_091712H_SL38_818153-2A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438011_HTHuGene21_091712H_SL45_818156-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438012_HTHuGene21_091712H_SL46_818156-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438013_HTHuGene21_101512H_SL185_818162-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438014_HTHuGene21_101512H_SL186_818162-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438015_HTHuGene21_101812H_SL198_818172-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438016_HTHuGene21_101812H_SL197_818172-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438017_HTHuGene21_100212H_SL107_818174-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438018_HTHuGene21_100212H_SL108_818174-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438019_HTHuGene21_100212H_SL117_818181-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438020_HTHuGene21_100212H_SL118_818181-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438021_HTHuGene21_101512H_SL189_818195-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438022_HTHuGene21_101512H_SL190_818195-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438023_HTHuGene21_111412H_SL305_818200-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438024_HTHuGene21_111412H_SL306_818200-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438025_HTHuGene21_101812H_SL193_818224-1_2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438026_HTHuGene21_101812H_SL194_818224-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438027_HTHuGene21_101812H_SL200_818241-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438028_HTHuGene21_101812H_SL201_818241-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438029_HTHuGene21_091912H_SL53_818246-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438030_HTHuGene21_091912H_SL54_818246-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438031_HTHuGene21_101812H_SL207_818249-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438032_HTHuGene21_101812H_SL208_818249-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438033_HTHuGene21_100912H_SL152_818257-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438034_HTHuGene21_100912H_SL153_818257-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438035_HTHuGene21_100912H_SL162_818308-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438036_HTHuGene21_100912H_SL163_818308-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438037_HTHuGene21_110512H_SL275_818357-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438038_HTHuGene21_110512H_SL276_818357-2B.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438039_HTHuGene21_111912H_SL325_818361-1i.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438040_HTHuGene21_111912H_SL326_818361-2A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438041_HTHuGene21_102512H_SL231_818368-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438042_HTHuGene21_102512H_SL232_818368-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438043_HTHuGene21_100912H_SL145_818381-1B.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438044_HTHuGene21_100912H_SL146_818381-2A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438045_HTHuGene21_102912H_SL243_818409-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438046_HTHuGene21_102912H_SL244_818409-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438047_HTHuGene21_110512H_SL271_818481-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438048_HTHuGene21_110512H_SL272_818481-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438049_HTHuGene21_110512H_SL281_818614-1A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438050_HTHuGene21_110512H_SL282_818614-2C.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438051_HTHuGene21_111412H_SL291_818615-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438052_HTHuGene21_111412H_SL292_818615-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438053_HTHuGene21_111412H_SL295_818626-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438054_HTHuGene21_111412H_SL296_818626-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438055_HTHuGene21_111412H_SL303_818670-1A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438056_HTHuGene21_111412H_SL304_818670-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438057_HTHuGene21_111412H_SL307_818684-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438058_HTHuGene21_111412H_SL308_818684-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438059_HTHuGene21_111912H_SL317_818781-1C.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438060_HTHuGene21_111912H_SL318_818781-2A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438061_HTHuGene21_111912H_SL321_818827-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438062_HTHuGene21_111912H_SL322_818827-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438063_HTHuGene21_111912H_SL329_830347-1B.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438064_HTHuGene21_111912H_SL330_830347-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438065_HTHuGene21_091912H_SL61_830356-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438066_HTHuGene21_091912H_SL62_830356-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438067_HTHuGene21_111912H_SL331_830370-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438068_HTHuGene21_111912H_SL332_830370-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438069_HTHuGene21_111412H_SL293_830381-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438070_HTHuGene21_111412H_SL294_830381-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438071_HTHuGene21_100212H_SL101_830397-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438072_HTHuGene21_100212H_SL102_830397-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438073_HTHuGene21_100212H_SL113_830398-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438074_HTHuGene21_100212H_SL114_830398-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438075_HTHuGene21_100912H_SL160_830432-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438076_HTHuGene21_100912H_SL161_830432-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438077_HTHuGene21_100412H_SL127_830446-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438078_HTHuGene21_100412H_SL128_830446-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438079_HTHuGene21_082912H_SL23_830478-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438080_HTHuGene21_082912H_SL24_830478-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438081_HTHuGene21_101512H_SL179_830505-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438082_HTHuGene21_101512H_SL180_830505-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438083_HTHuGene21_091712H_SL47_830507-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438084_HTHuGene21_091712H_SL48_830507-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438085_HTHuGene21_102512H_SL237_830515-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438086_HTHuGene21_102512H_SL238_830515-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438087_HTHuGene21_100912H_SL165_830518-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438088_HTHuGene21_100912H_SL166_830518-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438089_HTHuGene21_100412H_SL131_830538-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438090_HTHuGene21_100412H_SL132_830538-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438091_HTHuGene21_100912H_SL154_830544-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438092_HTHuGene21_100912H_SL155_830544-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438093_HTHuGene21_100912H_SL167_830554-1C.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438094_HTHuGene21_100912H_SL168_830554-2A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438095_HTHuGene21_101812H_SL215_830560-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438096_HTHuGene21_101812H_SL216_830560-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438097_HTHuGene21_102912H_SL263_830561-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438098_HTHuGene21_102912H_SL264_830561-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438099_HTHuGene21_101812H_SL209_830575-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438100_HTHuGene21_101812H_SL210_830575-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438101_HTHuGene21_091712H_SL25_830576-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438102_HTHuGene21_091712H_SL26_830576-2A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438103_HTHuGene21_092712H_SL91_830584-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438104_HTHuGene21_092712H_SL92_830584-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438105_HTHuGene21_100412H_SL143_830587-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438106_HTHuGene21_100412H_SL144_830587-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438107_HTHuGene21_102512H_SL220_830590-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438108_HTHuGene21_102512H_SL222_830590-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438109_HTHuGene21_082912H_SL6_830597-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438110_HTHuGene21_082912H_SL7_830597-2A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438111_HTHuGene21_100212H_SL119_830607-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438112_HTHuGene21_100212H_SL120_830607-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438113_HTHuGene21_091712H_SL29_830656-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438114_HTHuGene21_091712H_SL30_830656-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438115_HTHuGene21_102512H_SL233_830692-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438116_HTHuGene21_102512H_SL234_830692-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438117_HTHuGene21_102512H_SL223_830741-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438118_HTHuGene21_102512H_SL224_830741-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438119_HTHuGene21_102912H_SL251_830762-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438120_HTHuGene21_102912H_SL252_830762-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438121_HTHuGene21_111912H_SL333_830790-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438122_HTHuGene21_111912H_SL334_830790-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438123_HTHuGene21_091712H_SL43_830872-1A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438124_HTHuGene21_091712H_SL44_830872-2A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438125_HTHuGene21_102912H_SL257_830909-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438126_HTHuGene21_102912H_SL258_830909-2.CEL
The data is now a specific FeatureSet object containing the data from my CEL files.
## The number of microarray probes is equal to 1416100 and the number of microarray samples is equal to 326
## The type of the raw data is an GeneFeatureSet
How to retrieve intensities of specific rows in the CEL files? There are two methods exprs() and intensity() that can obtain intensity data. Both methods return the same result: a matrix with intensities of all probes.
expr <- oligo::exprs(data)
expr[1:10,1:10]
## GSM1437801_HTHuGene21_111412H_SL309_515122-1.CEL
## 1 118.0
## 2 5303.4
## 3 133.0
## 4 5557.9
## 5 102.0
## 6 117.0
## 7 124.0
## 8 206.0
## 9 97.0
## 10 72.0
## GSM1437802_HTHuGene21_111412H_SL310_515122-2.CEL
## 1 130.0
## 2 5664.5
## 3 144.0
## 4 5532.6
## 5 91.0
## 6 168.0
## 7 115.0
## 8 146.0
## 9 91.0
## 10 78.0
## GSM1437803_HTHuGene21_092712H_SL77_810384-1.CEL
## 1 104.0
## 2 4277.1
## 3 106.0
## 4 4889.8
## 5 77.0
## 6 101.0
## 7 80.0
## 8 209.0
## 9 53.0
## 10 54.0
## GSM1437804_HTHuGene21_092712H_SL78_810384-2.CEL
## 1 121.0
## 2 5610.6
## 3 145.0
## 4 6418.1
## 5 88.0
## 6 183.0
## 7 64.0
## 8 172.0
## 9 82.0
## 10 59.0
## GSM1437805_HTHuGene21_111912H_SL313_810392-1.CEL
## 1 161.0
## 2 4597.7
## 3 182.0
## 4 5007.0
## 5 100.0
## 6 173.0
## 7 121.0
## 8 198.0
## 9 86.0
## 10 84.0
## GSM1437806_HTHuGene21_111912H_SL314_810392-2A.CEL
## 1 130.0
## 2 4706.3
## 3 133.0
## 4 5155.9
## 5 105.0
## 6 145.0
## 7 83.0
## 8 133.0
## 9 58.0
## 10 59.0
## GSM1437807_HTHuGene21_101512H_SL181_810401-1.CEL
## 1 135.0
## 2 5351.1
## 3 117.0
## 4 5758.2
## 5 96.0
## 6 147.0
## 7 91.0
## 8 142.0
## 9 70.0
## 10 79.0
## GSM1437808_HTHuGene21_101512H_SL182_810401-2.CEL
## 1 152.0
## 2 6002.2
## 3 158.0
## 4 6022.5
## 5 99.0
## 6 113.0
## 7 78.0
## 8 82.0
## 9 69.0
## 10 91.0
## GSM1437809_HTHuGene21_101512H_SL169_810413-1.CEL
## 1 298.0
## 2 5577.3
## 3 292.0
## 4 5974.0
## 5 202.0
## 6 168.0
## 7 140.0
## 8 257.0
## 9 106.0
## 10 124.0
## GSM1437810_HTHuGene21_101512H_SL170_810413-2.CEL
## 1 139.0
## 2 4587.3
## 3 142.0
## 4 4813.3
## 5 95.0
## 6 112.0
## 7 73.0
## 8 117.0
## 9 66.0
## 10 69.0
How to retrieve intensities of PM probes of specific rows in the CEL files? I use the pm() function.
pm <- oligo::pm(data)
pm[1:10,1:10]
## GSM1437801_HTHuGene21_111412H_SL309_515122-1.CEL
## 6 117
## 7 124
## 9 97
## 10 72
## 12 108
## 15 50
## 23 140
## 25 44
## 26 43
## 27 41
## GSM1437802_HTHuGene21_111412H_SL310_515122-2.CEL
## 6 168
## 7 115
## 9 91
## 10 78
## 12 119
## 15 62
## 23 153
## 25 83
## 26 49
## 27 70
## GSM1437803_HTHuGene21_092712H_SL77_810384-1.CEL
## 6 101
## 7 80
## 9 53
## 10 54
## 12 121
## 15 51
## 23 74
## 25 36
## 26 41
## 27 40
## GSM1437804_HTHuGene21_092712H_SL78_810384-2.CEL
## 6 183
## 7 64
## 9 82
## 10 59
## 12 134
## 15 48
## 23 52
## 25 42
## 26 45
## 27 41
## GSM1437805_HTHuGene21_111912H_SL313_810392-1.CEL
## 6 173
## 7 121
## 9 86
## 10 84
## 12 137
## 15 59
## 23 58
## 25 40
## 26 43
## 27 38
## GSM1437806_HTHuGene21_111912H_SL314_810392-2A.CEL
## 6 145
## 7 83
## 9 58
## 10 59
## 12 71
## 15 55
## 23 43
## 25 45
## 26 42
## 27 43
## GSM1437807_HTHuGene21_101512H_SL181_810401-1.CEL
## 6 147
## 7 91
## 9 70
## 10 79
## 12 100
## 15 69
## 23 46
## 25 69
## 26 42
## 27 45
## GSM1437808_HTHuGene21_101512H_SL182_810401-2.CEL
## 6 113
## 7 78
## 9 69
## 10 91
## 12 79
## 15 64
## 23 78
## 25 74
## 26 57
## 27 56
## GSM1437809_HTHuGene21_101512H_SL169_810413-1.CEL
## 6 168
## 7 140
## 9 106
## 10 124
## 12 145
## 15 99
## 23 84
## 25 84
## 26 79
## 27 92
## GSM1437810_HTHuGene21_101512H_SL170_810413-2.CEL
## 6 112
## 7 73
## 9 66
## 10 69
## 12 105
## 15 56
## 23 52
## 25 59
## 26 61
## 27 51
Apart from the expression data itself, microarray data need to include information about the samples that were hybridized to the arrays. One of them is called phenoData. It contains labels for the samples. However, for most data sets the phenoData has not been defined. How to retrieve the sample annotation of the data?
ph <- data@phenoData; ph
## An object of class 'AnnotatedDataFrame'
## rowNames: GSM1437801_HTHuGene21_111412H_SL309_515122-1.CEL
## GSM1437802_HTHuGene21_111412H_SL310_515122-2.CEL ...
## GSM1438126_HTHuGene21_102912H_SL258_830909-2.CEL (326 total)
## varLabels: index
## varMetadata: labelDescription channel
How to retrieve the probe annotation of the data?
feat <- data@featureData
feat@data
## data frame with 0 columns and 1416100 rows
But as I see, the featureData has not been defined. I’ll also retrieve the number of probes represented on the arrays.
length(probeNames(data))
## [1] 1025088
NA_values <- which(is.na(Biobase::exprs(data)), arr.ind=T)
NaN_values <- which(apply(Biobase::exprs(data), 2, function(x) all(is.nan(x))))
infinite_values <- which(apply(Biobase::exprs(data), 2, function(x) all(is.infinite(x))))
blank_values <- function (x) {sum(x=="") }
bvalues <- apply(Biobase::exprs(data), 2,blank_values); bvalues<-as.character(bvalues);count<-0
for(index in 1:length(bvalues)){
if(bvalues[index]!=0){
count=count+1 } }
| Count | |
|---|---|
| NA values | 0 |
| NaN values | 0 |
| Infinite values | 0 |
| Blank values | 0 |
ph@data[ ,1] <- c("control1","control2","control3","control4","control5","control6","control7","control8","sPTD1","sPTD2","control9","control10","control11","control12","control13","control14","control15","control16","control17","control18","control19","control20","control21","control22","control23","control24","control25","control26","control27","control28","sPTD3","control29","control30","control31","control32","control33","control34","control35","control36","control37","control38","control39","control40","PPROM1","PPROM2","control41","control42","control43","control44","control45","control46","control47","control48","control49","control50","sPTD4","sPTD5","control51","control52","sPTD6","sPTD7","control53","control54","control55","control56","control57","control58","control59","control60","control61","control62","control63","control64","control65","control66","control67","control68","PPROM3","sPTD8","sPTD9","control69","control70","control71","control72","PPROM4","PPROM5","sPTD10","sPTD11","control73","control74","control75","control76","PPROM6","PPROM7","control77","control78","PPROM8","PPROM9","control79","control80","control81","control82","control83","control84","control85","control86","PPROM10","PPROM11","PPROM12","PPROM13","control87","control88","control89","control90","control91","control92","control93","control94","control95","control96","PPROM14","control97","control98","control99","control100","PPROM15","PPROM16","control101","control102","control103","control104","control105","control106","control107","control108","PPROM17","PPROM18","control109","control110","control111","control112","sPTD12","sPTD13","PPROM19","PPROM20","PPROM21","control113","control114","control115","control116","PPROM22","PPROM23","control117","control118","control119","control120","control121","control122","PPROM24","PPROM25","control123","control124","control125","control126","control127","control128","PPROM26","PPROM27","control129","control130","control131","control132","control133","control134","control135","control136","sPTD14","sPTD15","sPTD16","sPTD17","control137","control138","control139","control140","PPROM28","PPROM29","control141","control142","control143","control144","PPROM30","PPROM31","control145","control146","control147","control148","control149","control150","control151","control152","control153","control154","control155","control156","control157","control158","control159","control160","control161","control162","control163","control164","PPROM32","PPROM33","control165","control166","control167","control168","control169","control170","PPROM34","PPROM35","control171","control172","PPROM36","PPROM37","PPROM38","PPROM39","control173","control174","PPROM40","PPROM41","control175","control176","control177","control178","control179","control180","control181","control182","PPROM42","PPROM43","control183","control184","PPROM44","PPROM45","PPROM46","PPROM47","PPROM48","PPROM49","sPTD18","sPTD19","PPROM50","PPROM51","sPTD20","sPTD21","PPROM52","PPROM53","PPROM54","PPROM55","PPROM56","PPROM57","control185","control186","control187","control188","control189","control190","control191","control192","sPTD22","sPTD23","PPROM58","PPROM59","control193","control194","sPTD24","sPTD25","control195","control196","PPROM60","PPROM61","control197","control198","control199","control200","control201","control202","control203","control204","control205","control206","control207","control208","sPTD26","sPTD27","control209","control210","control211","control212","control213","control214","control215","control216","control217","control218","sPTD28","sPTD29","control219","control220","control221","control222","control223","control224","control225","control226","control227","control228","PPROM62","PPROM63","PPROM64","PPROM65","PPROM66","PPROM67","PPROM68","PPROM69"); ph
## An object of class 'AnnotatedDataFrame'
## rowNames: GSM1437801_HTHuGene21_111412H_SL309_515122-1.CEL
## GSM1437802_HTHuGene21_111412H_SL310_515122-2.CEL ...
## GSM1438126_HTHuGene21_102912H_SL258_830909-2.CEL (326 total)
## varLabels: index
## varMetadata: labelDescription channel
It’s time to create some plot to assess the quality of the data.
MA plots are developed for two-color arrays to detect differences between the two color labels on the same array. The MA plot shows to what extent the variability in expression depends on the expression level.
In an MA-plot, A is plotted versus M:
M is the difference between the intensity of a probe on the array and the median intensity of that probe over all arrays; Formula: M = logPMInt_array - logPMInt_medianarray
A is the average of the intensity of a probe on that array and the median intesity of that probe over all arrays; Formula: A = (logPMInt_array + logPMInt_medianarray)/2
I’m going to draw MA plots for the first few microarrays, because plotting above ten arrays is computationally expensive. The which argument allows me to specify which array to compare with the median array. Note that I didn’t use the par() method because for better and proper visualization/clarity of these MA plots.
for(i in 1:3){
MAplot(data,which=i)
}
Ideally, the cloud of data points should be centered around M=0 (blue line). Additionally, the variability of the M values should be similar for different A values (average intensities). I also see that the spread of the cloud increases with the average intensity: the loess curve (red line) moves further and further away from M=0 when A increases. To remove (some of) this dependency, I should normalize the data.
I’ll then check for distribution of signal value across the samples.oligo::boxplot(data, target = "core", main = "Boxplot of log2-intensitites for the raw data", las=2,names=c("control1","control2","control3","control4","control5","control6","control7","control8","sPTD1","sPTD2","control9","control10","control11","control12","control13","control14","control15","control16","control17","control18","control19","control20","control21","control22","control23","control24","control25","control26","control27","control28","sPTD3","control29","control30","control31","control32","control33","control34","control35","control36","control37","control38","control39","control40","PPROM1","PPROM2","control41","control42","control43","control44","control45","control46","control47","control48","control49","control50","sPTD4","sPTD5","control51","control52","sPTD6","sPTD7","control53","control54","control55","control56","control57","control58","control59","control60","control61","control62","control63","control64","control65","control66","control67","control68","PPROM3","sPTD8","sPTD9","control69","control70","control71","control72","PPROM4","PPROM5","sPTD10","sPTD11","control73","control74","control75","control76","PPROM6","PPROM7","control77","control78","PPROM8","PPROM9","control79","control80","control81","control82","control83","control84","control85","control86","PPROM10","PPROM11","PPROM12","PPROM13","control87","control88","control89","control90","control91","control92","control93","control94","control95","control96","PPROM14","control97","control98","control99","control100","PPROM15","PPROM16","control101","control102","control103","control104","control105","control106","control107","control108","PPROM17","PPROM18","control109","control110","control111","control112","sPTD12","sPTD13","PPROM19","PPROM20","PPROM21","control113","control114","control115","control116","PPROM22","PPROM23","control117","control118","control119","control120","control121","control122","PPROM24","PPROM25","control123","control124","control125","control126","control127","control128","PPROM26","PPROM27","control129","control130","control131","control132","control133","control134","control135","control136","sPTD14","sPTD15","sPTD16","sPTD17","control137","control138","control139","control140","PPROM28","PPROM29","control141","control142","control143","control144","PPROM30","PPROM31","control145","control146","control147","control148","control149","control150","control151","control152","control153","control154","control155","control156","control157","control158","control159","control160","control161","control162","control163","control164","PPROM32","PPROM33","control165","control166","control167","control168","control169","control170","PPROM34","PPROM35","control171","control172","PPROM36","PPROM37","PPROM38","PPROM39","control173","control174","PPROM40","PPROM41","control175","control176","control177","control178","control179","control180","control181","control182","PPROM42","PPROM43","control183","control184","PPROM44","PPROM45","PPROM46","PPROM47","PPROM48","PPROM49","sPTD18","sPTD19","PPROM50","PPROM51","sPTD20","sPTD21","PPROM52","PPROM53","PPROM54","PPROM55","PPROM56","PPROM57","control185","control186","control187","control188","control189","control190","control191","control192","sPTD22","sPTD23","PPROM58","PPROM59","control193","control194","sPTD24","sPTD25","control195","control196","PPROM60","PPROM61","control197","control198","control199","control200","control201","control202","control203","control204","control205","control206","control207","control208","sPTD26","sPTD27","control209","control210","control211","control212","control213","control214","control215","control216","control217","control218","sPTD28","sPTD29","control219","control220","control221","control222","control223","control224","control225","control226","control227","control228","PPROM62","PPROM63","PPROM64","PPROM65","PPROM66","PPROM67","PPROM68","PPROM69"),col=c("red","red","red","red","red","red","red","red","green","green","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","green","red","red","red","red","red","red","red","red","red","red","red","red","blue","blue","red","red","red","red","red","red","red","red","red","red","green","green","red","red","green","green","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","blue","green","green","red","red","red","red","blue","blue","green","green","red","red","red","red","blue","blue","red","red","blue","blue","red","red","red","red","red","red","red","red","blue","blue","blue","blue","red","red","red","red","red","red","red","red","red","red","blue","red","red","red","red","blue","blue","red","red","red","red","red","red","red","red","blue","blue","red","red","red","red","green","green","blue","blue","blue","red","red","red","red","blue","blue","red","red","red","red","red","red","blue","blue","red","red","red","red","red","red","blue","blue","red","red","red","red","red","red","red","red","green","green","green","green","red","red","red","red","blue","blue","red","red","red","red","blue","blue","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","blue","blue","red","red","red","red","red","red","blue","blue","red","red","blue","blue","blue","blue","red","red","blue","blue","red","red","red","red","red","red","red","red","blue","blue","red","red","blue","blue","blue","blue","blue","blue","green","green","blue","blue","green","green","blue","blue","blue","blue","blue","blue","red","red","red","red","red","red","red","red","green","green","blue","blue","red","red","green","green","red","red","blue","blue","red","red","red","red","red","red","red","red","red","red","red","red","green","green","red","red","red","red","red","red","red","red","red","red","green","green","red","red","red","red","red","red","red","red","red","red","blue","blue","blue","blue","blue","blue","blue","blue"))
The standard method for normalization is RMA. This latter is one of the few normalization methods that only uses the PM probes. But how to normalize the data using RMA? The rma() method produces a data matrix for Affymetrix arrays. The input for rma() function is an FeatureSet object while its output is an ExpressionSet object with the data matrix containing the normalized log-intensities in the exprs slot.
data.rma <- oligo::rma(data)
## Background correcting
## Normalizing
## Calculating Expression
data.matrix <- Biobase::exprs(data.rma)
Normalization can also be done using the GCRMA algorithm. GCRMA is based on RMA, having all the good sides of RMA. The difference lies in the background correction, all other steps are the same. GCRMA corrects for non-specific binding to the probes in contrast to RMA which completely ignores the issue of non-specific binding.
I’ll re-visualize the first few MA plots.
When I compare this plot to the one created for the raw intensities, I see a much more symmetric and even spread of the data indicating that the dependence of the variability on the average expression level is not as strong as it was before normalization.
Not only MA plots, but boxplots will show us the comparison between the raw and the normalized data. I will show the first few arrays for better clarity and proper visualization of these boxplots.
Without using ggplot:
Using ggplot: How to create a box plot of normalized intensities?
I will now compare sPTD and PPROM to a set of Control women.
Firstly, I need to tell limma which samples are replicates and which samples belong to different groups. To this end, I will add a second column with sample annotation describing the source of each sample & I will give this new column a name.
ph@data[ ,2] <- c("control","control","control","control","control","control","control","control","sPTD","sPTD","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","sPTD","control","control","control","control","control","control","control","control","control","control","control","control","PPROM","PPROM","control","control","control","control","control","control","control","control","control","control","sPTD","sPTD","control","control","sPTD","sPTD","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","PPROM","sPTD","sPTD","control","control","control","control","PPROM","PPROM","sPTD","sPTD","control","control","control","control","PPROM","PPROM","control","control","PPROM","PPROM","control","control","control","control","control","control","control","control","PPROM","PPROM","PPROM","PPROM","control","control","control","control","control","control","control","control","control","control","PPROM","control","control","control","control","PPROM","PPROM","control","control","control","control","control","control","control","control","PPROM","PPROM","control","control","control","control","sPTD","sPTD","PPROM","PPROM","PPROM","control","control","control","control","PPROM","PPROM","control","control","control","control","control","control","PPROM","PPROM","control","control","control","control","control","control","PPROM","PPROM","control","control","control","control","control","control","control","control","sPTD","sPTD","sPTD","sPTD","control","control","control","control","PPROM","PPROM","control","control","control","control","PPROM","PPROM","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","PPROM","PPROM","control","control","control","control","control","control","PPROM","PPROM","control","control","PPROM","PPROM","PPROM","PPROM","control","control","PPROM","PPROM","control","control","control","control","control","control","control","control","PPROM","PPROM","control","control","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","sPTD","sPTD","PPROM","PPROM","sPTD","sPTD","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","control","control","control","control","control","control","control","control","sPTD","sPTD","PPROM","PPROM","control","control","sPTD","sPTD","control","control","PPROM","PPROM","control","control","control","control","control","control","control","control","control","control","control","control","sPTD","sPTD","control","control","control","control","control","control","control","control","control","control","sPTD","sPTD","control","control","control","control","control","control","control","control","control","control","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM")
colnames(ph@data)[2] <- "level"; ph@data
## index level
## GSM1437801_HTHuGene21_111412H_SL309_515122-1.CEL control1 control
## GSM1437802_HTHuGene21_111412H_SL310_515122-2.CEL control2 control
## GSM1437803_HTHuGene21_092712H_SL77_810384-1.CEL control3 control
## GSM1437804_HTHuGene21_092712H_SL78_810384-2.CEL control4 control
## GSM1437805_HTHuGene21_111912H_SL313_810392-1.CEL control5 control
## GSM1437806_HTHuGene21_111912H_SL314_810392-2A.CEL control6 control
## GSM1437807_HTHuGene21_101512H_SL181_810401-1.CEL control7 control
## GSM1437808_HTHuGene21_101512H_SL182_810401-2.CEL control8 control
## GSM1437809_HTHuGene21_101512H_SL169_810413-1.CEL sPTD1 sPTD
## GSM1437810_HTHuGene21_101512H_SL170_810413-2.CEL sPTD2 sPTD
## GSM1437811_HTHuGene21_100412H_SL139_810416-1.CEL control9 control
## GSM1437812_HTHuGene21_100412H_SL140_810416-2.CEL control10 control
## GSM1437813_HTHuGene21_091912H_SL57_810421-1.CEL control11 control
## GSM1437814_HTHuGene21_091912H_SL58_810421-2.CEL control12 control
## GSM1437815_HTHuGene21_102512H_SL225_810424-1_2.CEL control13 control
## GSM1437816_HTHuGene21_102512H_SL226_810424-2.CEL control14 control
## GSM1437817_HTHuGene21_102912H_SL261_810430-1.CEL control15 control
## GSM1437818_HTHuGene21_102912H_SL262_810430-2C.CEL control16 control
## GSM1437819_HTHuGene21_101812H_SL211_810432-1.CEL control17 control
## GSM1437820_HTHuGene21_101812H_SL212_810432-2.CEL control18 control
## GSM1437821_HTHuGene21_092712H_SL81_810439-1B.CEL control19 control
## GSM1437822_HTHuGene21_092712H_SL82_810439-2A.CEL control20 control
## GSM1437823_HTHuGene21_102512H_SL235_810447-1.CEL control21 control
## GSM1437824_HTHuGene21_102512H_SL236_810447-2.CEL control22 control
## GSM1437825_HTHuGene21_100212H_SL106_810460-1.CEL control23 control
## GSM1437826_HTHuGene21_100212H_SL103_810460-2.CEL control24 control
## GSM1437827_HTHuGene21_102912H_SL253_810462-1.CEL control25 control
## GSM1437828_HTHuGene21_102912H_SL254_810462-2.CEL control26 control
## GSM1437829_HTHuGene21_111412H_SL300_810469-1.CEL control27 control
## GSM1437830_HTHuGene21_111412H_SL297_810469-2_2.CEL control28 control
## GSM1437831_HTHuGene21_100912H_SL164_810477-1.CEL sPTD3 sPTD
## GSM1437832_HTHuGene21_100412H_SL125_810494-1.CEL control29 control
## GSM1437833_HTHuGene21_100412H_SL126_810494-2.CEL control30 control
## GSM1437834_HTHuGene21_092712H_SL73_810501-1.CEL control31 control
## GSM1437835_HTHuGene21_092712H_SL74_810501-2.CEL control32 control
## GSM1437836_HTHuGene21_092712H_SL87_810507-1.CEL control33 control
## GSM1437837_HTHuGene21_092712H_SL88_810507-2.CEL control34 control
## GSM1437838_HTHuGene21_100212H_SL99_810516-1.CEL control35 control
## GSM1437839_HTHuGene21_100212H_SL100_810516-2.CEL control36 control
## GSM1437840_HTHuGene21_100412H_SL133_810518-1.CEL control37 control
## GSM1437841_HTHuGene21_100412H_SL134_810518-2.CEL control38 control
## GSM1437842_HTHuGene21_101812H_SL202_810521-1.CEL control39 control
## GSM1437843_HTHuGene21_101812H_SL203_810521-2.CEL control40 control
## GSM1437844_HTHuGene21_102512H_SL227_810529-1.CEL PPROM1 PPROM
## GSM1437845_HTHuGene21_102512H_SL228_810529-2.CEL PPROM2 PPROM
## GSM1437846_HTHuGene21_101512H_SL183_810533-1.CEL control41 control
## GSM1437847_HTHuGene21_101512H_SL184_810533-2.CEL control42 control
## GSM1437848_HTHuGene21_101512H_SL173_810545-1.CEL control43 control
## GSM1437849_HTHuGene21_101512H_SL174_810545-2.CEL control44 control
## GSM1437850_HTHuGene21_082912H_SL19_810563-1.CEL control45 control
## GSM1437851_HTHuGene21_082912H_SL20_810563-2.CEL control46 control
## GSM1437852_HTHuGene21_102512H_SL229_810568-1.CEL control47 control
## GSM1437853_HTHuGene21_102512H_SL230_810568-2.CEL control48 control
## GSM1437854_HTHuGene21_100212H_SL109_810619-1.CEL control49 control
## GSM1437855_HTHuGene21_100212H_SL110_810619-2.CEL control50 control
## GSM1437856_HTHuGene21_091712H_SL39_810657-1.CEL sPTD4 sPTD
## GSM1437857_HTHuGene21_091712H_SL40_810657-2.CEL sPTD5 sPTD
## GSM1437858_HTHuGene21_092712H_SL95_812226-1.CEL control51 control
## GSM1437859_HTHuGene21_092712H_SL96_812226-2.CEL control52 control
## GSM1437860_HTHuGene21_082912H_SL8_812228-1.CEL sPTD6 sPTD
## GSM1437861_HTHuGene21_082912H_SL9_812228-2.CEL sPTD7 sPTD
## GSM1437862_HTHuGene21_102512H_SL217_812230-1_2.CEL control53 control
## GSM1437863_HTHuGene21_102512H_SL218_812230-2.CEL control54 control
## GSM1437864_HTHuGene21_101512H_SL191_812232-1.CEL control55 control
## GSM1437865_HTHuGene21_101512H_SL192_812232-2.CEL control56 control
## GSM1437866_HTHuGene21_101812H_SL213_812234-1.CEL control57 control
## GSM1437867_HTHuGene21_101812H_SL214_812234-2.CEL control58 control
## GSM1437868_HTHuGene21_100212H_SL111_812235-1.CEL control59 control
## GSM1437869_HTHuGene21_100212H_SL112_812235-2.CEL control60 control
## GSM1437870_HTHuGene21_101812H_SL204_812236-1.CEL control61 control
## GSM1437871_HTHuGene21_101812H_SL199_812236-2.CEL control62 control
## GSM1437872_HTHuGene21_111412H_SL311_812249-1.CEL control63 control
## GSM1437873_HTHuGene21_111412H_SL312_812249-2.CEL control64 control
## GSM1437874_HTHuGene21_111912H_SL335_812261-1.CEL control65 control
## GSM1437875_HTHuGene21_111912H_SL336_812261-2.CEL control66 control
## GSM1437876_HTHuGene21_101512H_SL171_812268-1.CEL control67 control
## GSM1437877_HTHuGene21_101512H_SL172_812268-2.CEL control68 control
## GSM1437878_HTHuGene21_082912H_SL18_812282-1.CEL PPROM3 PPROM
## GSM1437879_HTHuGene21_100212H_SL115_812285-1.CEL sPTD8 sPTD
## GSM1437880_HTHuGene21_100212H_SL116_812285-2.CEL sPTD9 sPTD
## GSM1437881_HTHuGene21_082912H_SL10_812292-1.CEL control69 control
## GSM1437882_HTHuGene21_082912H_SL11_812292-2.CEL control70 control
## GSM1437883_HTHuGene21_111912H_SL323_812296-1.CEL control71 control
## GSM1437884_HTHuGene21_111912H_SL324_812296-2.CEL control72 control
## GSM1437885_HTHuGene21_091912H_SL49_812302-1.CEL PPROM4 PPROM
## GSM1437886_HTHuGene21_091912H_SL50_812302-2.CEL PPROM5 PPROM
## GSM1437887_HTHuGene21_110512H_SL268_812309-1.CEL sPTD10 sPTD
## GSM1437888_HTHuGene21_110512H_SL265_812309-2.CEL sPTD11 sPTD
## GSM1437889_HTHuGene21_100212H_SL97_812324-1.CEL control73 control
## GSM1437890_HTHuGene21_100212H_SL98_812324-2.CEL control74 control
## GSM1437891_HTHuGene21_102912H_SL255_812329-1.CEL control75 control
## GSM1437892_HTHuGene21_102912H_SL256_812329-2.CEL control76 control
## GSM1437893_HTHuGene21_092712H_SL75_812342-1.CEL PPROM6 PPROM
## GSM1437894_HTHuGene21_092712H_SL76_812342-2.CEL PPROM7 PPROM
## GSM1437895_HTHuGene21_110512H_SL269_812344-1.CEL control77 control
## GSM1437896_HTHuGene21_110512H_SL270_812344-2.CEL control78 control
## GSM1437897_HTHuGene21_092712H_SL89_812359-1.CEL PPROM8 PPROM
## GSM1437898_HTHuGene21_092712H_SL90_812359-2.CEL PPROM9 PPROM
## GSM1437899_HTHuGene21_092712H_SL83_812366-1.CEL control79 control
## GSM1437900_HTHuGene21_092712H_SL84_812366-2.CEL control80 control
## GSM1437901_HTHuGene21_110512H_SL277_812387-1.CEL control81 control
## GSM1437902_HTHuGene21_110512H_SL278_812387-2.CEL control82 control
## GSM1437903_HTHuGene21_101812H_SL195_812396-1.CEL control83 control
## GSM1437904_HTHuGene21_101812H_SL196_812396-2.CEL control84 control
## GSM1437905_HTHuGene21_110512H_SL283_812407-1.CEL control85 control
## GSM1437906_HTHuGene21_110512H_SL284_812407-2.CEL control86 control
## GSM1437907_HTHuGene21_100212H_SL104_812448-1.CEL PPROM10 PPROM
## GSM1437908_HTHuGene21_100212H_SL105_812448-2.CEL PPROM11 PPROM
## GSM1437909_HTHuGene21_100412H_SL123_812459-1.CEL PPROM12 PPROM
## GSM1437910_HTHuGene21_100412H_SL124_812459-2.CEL PPROM13 PPROM
## GSM1437911_HTHuGene21_102512H_SL219_812477-1.CEL control87 control
## GSM1437912_HTHuGene21_102512H_SL221_812477-2.CEL control88 control
## GSM1437913_HTHuGene21_101512H_SL177_812509-1.CEL control89 control
## GSM1437914_HTHuGene21_101512H_SL178_812509-2.CEL control90 control
## GSM1437915_HTHuGene21_100412H_SL121_812518-1.CEL control91 control
## GSM1437916_HTHuGene21_100412H_SL122_812518-2C.CEL control92 control
## GSM1437917_HTHuGene21_102912H_SL241_812546-1_2.CEL control93 control
## GSM1437918_HTHuGene21_102912H_SL242_812546-2.CEL control94 control
## GSM1437919_HTHuGene21_111912H_SL315_812551-1.CEL control95 control
## GSM1437920_HTHuGene21_111912H_SL316_812551-2.CEL control96 control
## GSM1437921_HTHuGene21_100912H_SL149_812555-1.CEL PPROM14 PPROM
## GSM1437922_HTHuGene21_111412H_SL298_812559-1.CEL control97 control
## GSM1437923_HTHuGene21_111412H_SL299_812559-2.CEL control98 control
## GSM1437924_HTHuGene21_091912H_SL69_812562-1.CEL control99 control
## GSM1437925_HTHuGene21_091912H_SL70_812562-2.CEL control100 control
## GSM1437926_HTHuGene21_100912H_SL150_812566-1.CEL PPROM15 PPROM
## GSM1437927_HTHuGene21_100912H_SL151_812566-2.CEL PPROM16 PPROM
## GSM1437928_HTHuGene21_101512H_SL175_812573-1.CEL control101 control
## GSM1437929_HTHuGene21_101512H_SL176_812573-2.CEL control102 control
## GSM1437930_HTHuGene21_111412H_SL289_812574-1_2.CEL control103 control
## GSM1437931_HTHuGene21_111412H_SL290_812574-2.CEL control104 control
## GSM1437932_HTHuGene21_100912H_SL147_812581-1.CEL control105 control
## GSM1437933_HTHuGene21_100912H_SL148_812581-2.CEL control106 control
## GSM1437934_HTHuGene21_100412H_SL129_812586-1_2.CEL control107 control
## GSM1437935_HTHuGene21_100412H_SL130_812586-2.CEL control108 control
## GSM1437936_HTHuGene21_100912H_SL158_812587-1.CEL PPROM17 PPROM
## GSM1437937_HTHuGene21_100912H_SL159_812587-2.CEL PPROM18 PPROM
## GSM1437938_HTHuGene21_091912H_SL59_812590-1.CEL control109 control
## GSM1437939_HTHuGene21_091912H_SL60_812590-2.CEL control110 control
## GSM1437940_HTHuGene21_082912H_SL4_815072-1.CEL control111 control
## GSM1437941_HTHuGene21_082912H_SL5_815072-2.CEL control112 control
## GSM1437942_HTHuGene21_082912H_SL16_815073-1.CEL sPTD12 sPTD
## GSM1437943_HTHuGene21_082912H_SL17_815073-2.CEL sPTD13 sPTD
## GSM1437944_HTHuGene21_091712H_SL27_815076-1.CEL PPROM19 PPROM
## GSM1437945_HTHuGene21_091712H_SL28_815076-2.CEL PPROM20 PPROM
## GSM1437946_HTHuGene21_082912H_SL1_815082-1.CEL PPROM21 PPROM
## GSM1437947_HTHuGene21_110512H_SL273_815094-1.CEL control113 control
## GSM1437948_HTHuGene21_110512H_SL274_815094-2.CEL control114 control
## GSM1437949_HTHuGene21_091912H_SL63_815102-1.CEL control115 control
## GSM1437950_HTHuGene21_091912H_SL64_815102-2.CEL control116 control
## GSM1437951_HTHuGene21_091712H_SL33_815116-1.CEL PPROM22 PPROM
## GSM1437952_HTHuGene21_091712H_SL34_815116-2.CEL PPROM23 PPROM
## GSM1437953_HTHuGene21_100912H_SL156_815123-1.CEL control117 control
## GSM1437954_HTHuGene21_100912H_SL157_815123-2.CEL control118 control
## GSM1437955_HTHuGene21_091712H_SL31_815127-1.CEL control119 control
## GSM1437956_HTHuGene21_091712H_SL32_815127-2.CEL control120 control
## GSM1437957_HTHuGene21_082912H_SL2_815137-1.CEL control121 control
## GSM1437958_HTHuGene21_082912H_SL3_815137-2.CEL control122 control
## GSM1437959_HTHuGene21_091912H_SL55_815149-1.CEL PPROM24 PPROM
## GSM1437960_HTHuGene21_091912H_SL56_815149-2.CEL PPROM25 PPROM
## GSM1437961_HTHuGene21_102912H_SL245_815154-1.CEL control123 control
## GSM1437962_HTHuGene21_102912H_SL246_815154-2.CEL control124 control
## GSM1437963_HTHuGene21_082912H_SL21_815163-1.CEL control125 control
## GSM1437964_HTHuGene21_082912H_SL22_815163-2.CEL control126 control
## GSM1437965_HTHuGene21_110512H_SL285_815168-1.CEL control127 control
## GSM1437966_HTHuGene21_110512H_SL286_815168-2.CEL control128 control
## GSM1437967_HTHuGene21_091912H_SL67_815179-1.CEL PPROM26 PPROM
## GSM1437968_HTHuGene21_091912H_SL68_815179-2.CEL PPROM27 PPROM
## GSM1437969_HTHuGene21_110512H_SL266_815183-1.CEL control129 control
## GSM1437970_HTHuGene21_110512H_SL267_815183-2.CEL control130 control
## GSM1437971_HTHuGene21_102912H_SL247_815189-1.CEL control131 control
## GSM1437972_HTHuGene21_102912H_SL248_815189-2.CEL control132 control
## GSM1437973_HTHuGene21_082912H_SL14_815194-1B.CEL control133 control
## GSM1437974_HTHuGene21_082912H_SL15_815194-2.CEL control134 control
## GSM1437975_HTHuGene21_101512H_SL187_815196-1.CEL control135 control
## GSM1437976_HTHuGene21_101512H_SL188_815196-2.CEL control136 control
## GSM1437977_HTHuGene21_092712H_SL79_815200-1.CEL sPTD14 sPTD
## GSM1437978_HTHuGene21_092712H_SL80_815200-2.CEL sPTD15 sPTD
## GSM1437979_HTHuGene21_092712H_SL93_815218-1.CEL sPTD16 sPTD
## GSM1437980_HTHuGene21_092712H_SL94_815218-2.CEL sPTD17 sPTD
## GSM1437981_HTHuGene21_110512H_SL287_815219-1.CEL control137 control
## GSM1437982_HTHuGene21_110512H_SL288_815219-2.CEL control138 control
## GSM1437983_HTHuGene21_100412H_SL137_818022-1.CEL control139 control
## GSM1437984_HTHuGene21_100412H_SL138_818022-2.CEL control140 control
## GSM1437985_HTHuGene21_100412H_SL135_818023-1.CEL PPROM28 PPROM
## GSM1437986_HTHuGene21_100412H_SL136_818023-2.CEL PPROM29 PPROM
## GSM1437987_HTHuGene21_091912H_SL71_818025-1.CEL control141 control
## GSM1437988_HTHuGene21_091912H_SL72_818025-2.CEL control142 control
## GSM1437989_HTHuGene21_091712H_SL35_818032-1.CEL control143 control
## GSM1437990_HTHuGene21_091712H_SL36_818032-2.CEL control144 control
## GSM1437991_HTHuGene21_100412H_SL141_818034-1.CEL PPROM30 PPROM
## GSM1437992_HTHuGene21_100412H_SL142_818034-2.CEL PPROM31 PPROM
## GSM1437993_HTHuGene21_091712H_SL41_818036-1.CEL control145 control
## GSM1437994_HTHuGene21_091712H_SL42_818036-2.CEL control146 control
## GSM1437995_HTHuGene21_111912H_SL319_818046-1.CEL control147 control
## GSM1437996_HTHuGene21_111912H_SL320_818046-2.CEL control148 control
## GSM1437997_HTHuGene21_102912H_SL249_818054-1_2.CEL control149 control
## GSM1437998_HTHuGene21_102912H_SL250_818054-2.CEL control150 control
## GSM1437999_HTHuGene21_110512H_SL279_818070-1.CEL control151 control
## GSM1438000_HTHuGene21_110512H_SL280_818070-2.CEL control152 control
## GSM1438001_HTHuGene21_101812H_SL205_818081-1.CEL control153 control
## GSM1438002_HTHuGene21_101812H_SL206_818081-2.CEL control154 control
## GSM1438003_HTHuGene21_091912H_SL51_818084-1.CEL control155 control
## GSM1438004_HTHuGene21_091912H_SL52_818084-2.CEL control156 control
## GSM1438005_HTHuGene21_092712H_SL85_818088-1.CEL control157 control
## GSM1438006_HTHuGene21_092712H_SL86_818088-2.CEL control158 control
## GSM1438007_HTHuGene21_111412H_SL301_818125-1.CEL control159 control
## GSM1438008_HTHuGene21_111412H_SL302_818125-2.CEL control160 control
## GSM1438009_HTHuGene21_091712H_SL37_818153-1.CEL control161 control
## GSM1438010_HTHuGene21_091712H_SL38_818153-2A.CEL control162 control
## GSM1438011_HTHuGene21_091712H_SL45_818156-1.CEL control163 control
## GSM1438012_HTHuGene21_091712H_SL46_818156-2.CEL control164 control
## GSM1438013_HTHuGene21_101512H_SL185_818162-1.CEL PPROM32 PPROM
## GSM1438014_HTHuGene21_101512H_SL186_818162-2.CEL PPROM33 PPROM
## GSM1438015_HTHuGene21_101812H_SL198_818172-1.CEL control165 control
## GSM1438016_HTHuGene21_101812H_SL197_818172-2.CEL control166 control
## GSM1438017_HTHuGene21_100212H_SL107_818174-1.CEL control167 control
## GSM1438018_HTHuGene21_100212H_SL108_818174-2.CEL control168 control
## GSM1438019_HTHuGene21_100212H_SL117_818181-1.CEL control169 control
## GSM1438020_HTHuGene21_100212H_SL118_818181-2.CEL control170 control
## GSM1438021_HTHuGene21_101512H_SL189_818195-1.CEL PPROM34 PPROM
## GSM1438022_HTHuGene21_101512H_SL190_818195-2.CEL PPROM35 PPROM
## GSM1438023_HTHuGene21_111412H_SL305_818200-1.CEL control171 control
## GSM1438024_HTHuGene21_111412H_SL306_818200-2.CEL control172 control
## GSM1438025_HTHuGene21_101812H_SL193_818224-1_2.CEL PPROM36 PPROM
## GSM1438026_HTHuGene21_101812H_SL194_818224-2.CEL PPROM37 PPROM
## GSM1438027_HTHuGene21_101812H_SL200_818241-1.CEL PPROM38 PPROM
## GSM1438028_HTHuGene21_101812H_SL201_818241-2.CEL PPROM39 PPROM
## GSM1438029_HTHuGene21_091912H_SL53_818246-1.CEL control173 control
## GSM1438030_HTHuGene21_091912H_SL54_818246-2.CEL control174 control
## GSM1438031_HTHuGene21_101812H_SL207_818249-1.CEL PPROM40 PPROM
## GSM1438032_HTHuGene21_101812H_SL208_818249-2.CEL PPROM41 PPROM
## GSM1438033_HTHuGene21_100912H_SL152_818257-1.CEL control175 control
## GSM1438034_HTHuGene21_100912H_SL153_818257-2.CEL control176 control
## GSM1438035_HTHuGene21_100912H_SL162_818308-1.CEL control177 control
## GSM1438036_HTHuGene21_100912H_SL163_818308-2.CEL control178 control
## GSM1438037_HTHuGene21_110512H_SL275_818357-1.CEL control179 control
## GSM1438038_HTHuGene21_110512H_SL276_818357-2B.CEL control180 control
## GSM1438039_HTHuGene21_111912H_SL325_818361-1i.CEL control181 control
## GSM1438040_HTHuGene21_111912H_SL326_818361-2A.CEL control182 control
## GSM1438041_HTHuGene21_102512H_SL231_818368-1.CEL PPROM42 PPROM
## GSM1438042_HTHuGene21_102512H_SL232_818368-2.CEL PPROM43 PPROM
## GSM1438043_HTHuGene21_100912H_SL145_818381-1B.CEL control183 control
## GSM1438044_HTHuGene21_100912H_SL146_818381-2A.CEL control184 control
## GSM1438045_HTHuGene21_102912H_SL243_818409-1.CEL PPROM44 PPROM
## GSM1438046_HTHuGene21_102912H_SL244_818409-2.CEL PPROM45 PPROM
## GSM1438047_HTHuGene21_110512H_SL271_818481-1.CEL PPROM46 PPROM
## GSM1438048_HTHuGene21_110512H_SL272_818481-2.CEL PPROM47 PPROM
## GSM1438049_HTHuGene21_110512H_SL281_818614-1A.CEL PPROM48 PPROM
## GSM1438050_HTHuGene21_110512H_SL282_818614-2C.CEL PPROM49 PPROM
## GSM1438051_HTHuGene21_111412H_SL291_818615-1.CEL sPTD18 sPTD
## GSM1438052_HTHuGene21_111412H_SL292_818615-2.CEL sPTD19 sPTD
## GSM1438053_HTHuGene21_111412H_SL295_818626-1.CEL PPROM50 PPROM
## GSM1438054_HTHuGene21_111412H_SL296_818626-2.CEL PPROM51 PPROM
## GSM1438055_HTHuGene21_111412H_SL303_818670-1A.CEL sPTD20 sPTD
## GSM1438056_HTHuGene21_111412H_SL304_818670-2.CEL sPTD21 sPTD
## GSM1438057_HTHuGene21_111412H_SL307_818684-1.CEL PPROM52 PPROM
## GSM1438058_HTHuGene21_111412H_SL308_818684-2.CEL PPROM53 PPROM
## GSM1438059_HTHuGene21_111912H_SL317_818781-1C.CEL PPROM54 PPROM
## GSM1438060_HTHuGene21_111912H_SL318_818781-2A.CEL PPROM55 PPROM
## GSM1438061_HTHuGene21_111912H_SL321_818827-1.CEL PPROM56 PPROM
## GSM1438062_HTHuGene21_111912H_SL322_818827-2.CEL PPROM57 PPROM
## GSM1438063_HTHuGene21_111912H_SL329_830347-1B.CEL control185 control
## GSM1438064_HTHuGene21_111912H_SL330_830347-2.CEL control186 control
## GSM1438065_HTHuGene21_091912H_SL61_830356-1.CEL control187 control
## GSM1438066_HTHuGene21_091912H_SL62_830356-2.CEL control188 control
## GSM1438067_HTHuGene21_111912H_SL331_830370-1.CEL control189 control
## GSM1438068_HTHuGene21_111912H_SL332_830370-2.CEL control190 control
## GSM1438069_HTHuGene21_111412H_SL293_830381-1.CEL control191 control
## GSM1438070_HTHuGene21_111412H_SL294_830381-2.CEL control192 control
## GSM1438071_HTHuGene21_100212H_SL101_830397-1.CEL sPTD22 sPTD
## GSM1438072_HTHuGene21_100212H_SL102_830397-2.CEL sPTD23 sPTD
## GSM1438073_HTHuGene21_100212H_SL113_830398-1.CEL PPROM58 PPROM
## GSM1438074_HTHuGene21_100212H_SL114_830398-2.CEL PPROM59 PPROM
## GSM1438075_HTHuGene21_100912H_SL160_830432-1.CEL control193 control
## GSM1438076_HTHuGene21_100912H_SL161_830432-2.CEL control194 control
## GSM1438077_HTHuGene21_100412H_SL127_830446-1.CEL sPTD24 sPTD
## GSM1438078_HTHuGene21_100412H_SL128_830446-2.CEL sPTD25 sPTD
## GSM1438079_HTHuGene21_082912H_SL23_830478-1.CEL control195 control
## GSM1438080_HTHuGene21_082912H_SL24_830478-2.CEL control196 control
## GSM1438081_HTHuGene21_101512H_SL179_830505-1.CEL PPROM60 PPROM
## GSM1438082_HTHuGene21_101512H_SL180_830505-2.CEL PPROM61 PPROM
## GSM1438083_HTHuGene21_091712H_SL47_830507-1.CEL control197 control
## GSM1438084_HTHuGene21_091712H_SL48_830507-2.CEL control198 control
## GSM1438085_HTHuGene21_102512H_SL237_830515-1.CEL control199 control
## GSM1438086_HTHuGene21_102512H_SL238_830515-2.CEL control200 control
## GSM1438087_HTHuGene21_100912H_SL165_830518-1.CEL control201 control
## GSM1438088_HTHuGene21_100912H_SL166_830518-2.CEL control202 control
## GSM1438089_HTHuGene21_100412H_SL131_830538-1.CEL control203 control
## GSM1438090_HTHuGene21_100412H_SL132_830538-2.CEL control204 control
## GSM1438091_HTHuGene21_100912H_SL154_830544-1.CEL control205 control
## GSM1438092_HTHuGene21_100912H_SL155_830544-2.CEL control206 control
## GSM1438093_HTHuGene21_100912H_SL167_830554-1C.CEL control207 control
## GSM1438094_HTHuGene21_100912H_SL168_830554-2A.CEL control208 control
## GSM1438095_HTHuGene21_101812H_SL215_830560-1.CEL sPTD26 sPTD
## GSM1438096_HTHuGene21_101812H_SL216_830560-2.CEL sPTD27 sPTD
## GSM1438097_HTHuGene21_102912H_SL263_830561-1.CEL control209 control
## GSM1438098_HTHuGene21_102912H_SL264_830561-2.CEL control210 control
## GSM1438099_HTHuGene21_101812H_SL209_830575-1.CEL control211 control
## GSM1438100_HTHuGene21_101812H_SL210_830575-2.CEL control212 control
## GSM1438101_HTHuGene21_091712H_SL25_830576-1.CEL control213 control
## GSM1438102_HTHuGene21_091712H_SL26_830576-2A.CEL control214 control
## GSM1438103_HTHuGene21_092712H_SL91_830584-1.CEL control215 control
## GSM1438104_HTHuGene21_092712H_SL92_830584-2.CEL control216 control
## GSM1438105_HTHuGene21_100412H_SL143_830587-1.CEL control217 control
## GSM1438106_HTHuGene21_100412H_SL144_830587-2.CEL control218 control
## GSM1438107_HTHuGene21_102512H_SL220_830590-1.CEL sPTD28 sPTD
## GSM1438108_HTHuGene21_102512H_SL222_830590-2.CEL sPTD29 sPTD
## GSM1438109_HTHuGene21_082912H_SL6_830597-1.CEL control219 control
## GSM1438110_HTHuGene21_082912H_SL7_830597-2A.CEL control220 control
## GSM1438111_HTHuGene21_100212H_SL119_830607-1.CEL control221 control
## GSM1438112_HTHuGene21_100212H_SL120_830607-2.CEL control222 control
## GSM1438113_HTHuGene21_091712H_SL29_830656-1.CEL control223 control
## GSM1438114_HTHuGene21_091712H_SL30_830656-2.CEL control224 control
## GSM1438115_HTHuGene21_102512H_SL233_830692-1.CEL control225 control
## GSM1438116_HTHuGene21_102512H_SL234_830692-2.CEL control226 control
## GSM1438117_HTHuGene21_102512H_SL223_830741-1.CEL control227 control
## GSM1438118_HTHuGene21_102512H_SL224_830741-2.CEL control228 control
## GSM1438119_HTHuGene21_102912H_SL251_830762-1.CEL PPROM62 PPROM
## GSM1438120_HTHuGene21_102912H_SL252_830762-2.CEL PPROM63 PPROM
## GSM1438121_HTHuGene21_111912H_SL333_830790-1.CEL PPROM64 PPROM
## GSM1438122_HTHuGene21_111912H_SL334_830790-2.CEL PPROM65 PPROM
## GSM1438123_HTHuGene21_091712H_SL43_830872-1A.CEL PPROM66 PPROM
## GSM1438124_HTHuGene21_091712H_SL44_830872-2A.CEL PPROM67 PPROM
## GSM1438125_HTHuGene21_102912H_SL257_830909-1.CEL PPROM68 PPROM
## GSM1438126_HTHuGene21_102912H_SL258_830909-2.CEL PPROM69 PPROM
So, the factor that determines the grouping will have 3 levels.
groups <- ph@data$level
f <- factor(groups,levels=c("control","sPTD","PPROM"))
Then, I need to create a design matrix,which is a matrix of values of the grouping variable. ANOVA needs such a matrix to know which samples belong to which group. Since limma performs an ANOVA, it needs such a design matrix. I will create it using the model.matrix() method. The argument of the model.matrix method is a model formula.
design <- model.matrix(~ 0 + f)
colnames(design) <- levels(f)
#Fit linear model for each gene given a series of arrays
#arguments:
#object: A matrix-like data object containing log-ratios or log-expression values for a series of arrays, with rows corresponding to genes and columns to samples. Any type of data object that can be processed by getEAWP is acceptable.
#design: the design matrix of the microarray experiment, with rows corresponding to arrays and columns to coefficients to be estimated. Defaults to the unit vector meaning that the arrays are treated as replicates.
data.fit <- lmFit(data.rma, design)
Afterwards, I need to tell limma which groups I want to compare. For this I define a contrast matrix defining the contrasts of interest by using the makeContrasts() method.
#makeContrasts() -> Construct the contrast matrix corresponding to specified contrasts of a set of parameters.
cont.matrix <- makeContrasts(a=sPTD-control,b=PPROM-control,c=sPTD-PPROM,levels=design)
#contrasts.fit() -> Given a linear model fit to microarray data, compute estimated coefficients and standard errors for a given set of contrasts.
data.contr <- contrasts.fit(data.fit,cont.matrix)
#eBayes() -> Given a microarray linear model fit, compute moderated t-statistics, moderated F-statistic, and log-odds of differential expression by empirical Bayes moderation of the standard errors towards a common value.
data.fit.eb <- eBayes(data.contr)
data.fit.eb
## An object of class "MArrayLM"
## $coefficients
## Contrasts
## a b c
## 16650001 -0.006471273 -0.079911687 0.073440415
## 16650003 0.063567350 0.065018301 -0.001450951
## 16650005 0.035850796 -0.001419566 0.037270362
## 16650007 0.178746843 0.036739566 0.142007277
## 16650009 -0.019497922 -0.042810615 0.023312693
## 53612 more rows ...
##
## $rank
## [1] 3
##
## $assign
## [1] 1 1 1
##
## $qr
## $qr
## control sPTD PPROM
## 1 -15.09966887 0.000000 0.000000
## 2 0.06622662 -5.385165 0.000000
## 3 0.06622662 0.000000 -8.306624
## 4 0.06622662 0.000000 0.000000
## 5 0.06622662 0.000000 0.000000
## 321 more rows ...
##
## $qraux
## [1] 1.066227 1.000000 1.000000
##
## $pivot
## [1] 1 2 3
##
## $tol
## [1] 1e-07
##
## $rank
## [1] 3
##
##
## $df.residual
## [1] 323 323 323 323 323
## 53612 more elements ...
##
## $sigma
## 16650001 16650003 16650005 16650007 16650009
## 0.5887229 0.6501848 0.7107225 0.6680571 0.3141969
## 53612 more elements ...
##
## $cov.coefficients
## Contrasts
## Contrasts a b c
## a 0.038868724 0.004385965 0.03448276
## b 0.004385965 0.018878719 -0.01449275
## c 0.034482759 -0.014492754 0.04897551
##
## $stdev.unscaled
## Contrasts
## a b c
## 16650001 0.1971515 0.1373998 0.2213041
## 16650003 0.1971515 0.1373998 0.2213041
## 16650005 0.1971515 0.1373998 0.2213041
## 16650007 0.1971515 0.1373998 0.2213041
## 16650009 0.1971515 0.1373998 0.2213041
## 53612 more rows ...
##
## $pivot
## [1] 1 2 3
##
## $Amean
## 16650001 16650003 16650005 16650007 16650009
## 2.091280 3.249681 2.495044 3.774363 1.701569
## 53612 more elements ...
##
## $method
## [1] "ls"
##
## $design
## control sPTD PPROM
## 1 1 0 0
## 2 1 0 0
## 3 1 0 0
## 4 1 0 0
## 5 1 0 0
## 321 more rows ...
##
## $contrasts
## Contrasts
## Levels a b c
## control -1 -1 0
## sPTD 1 0 1
## PPROM 0 1 -1
##
## $df.prior
## [1] 3.66246
##
## $s2.prior
## [1] 0.0495584
##
## $var.prior
## [1] 0.3401842 0.2780299 0.3498472
##
## $proportion
## [1] 0.01
##
## $s2.post
## 16650001 16650003 16650005 16650007 16650009
## 0.34326438 0.41855624 0.50001877 0.44185211 0.09816852
## 53612 more elements ...
##
## $t
## Contrasts
## a b c
## 16650001 -0.05602414 -0.99268092 0.56641045
## 16650003 0.49837593 0.73142946 -0.01013413
## 16650005 0.25716124 -0.01461087 0.23816666
## 16650007 1.36395416 0.40226227 0.96534520
## 16650009 -0.31564729 -0.99444114 0.33621480
## 53612 more rows ...
##
## $df.total
## [1] 326.6625 326.6625 326.6625 326.6625 326.6625
## 53612 more elements ...
##
## $p.value
## Contrasts
## a b c
## 16650001 0.9553568 0.3216002 0.5715037
## 16650003 0.6185545 0.4650412 0.9919205
## 16650005 0.7972162 0.9883515 0.8119012
## 16650007 0.1735211 0.6877541 0.3350860
## 16650009 0.7524718 0.3207442 0.7369248
## 53612 more rows ...
##
## $lods
## Contrasts
## a b c
## 16650001 -5.732450 -5.510764 -5.502653
## 16650003 -5.622114 -5.721780 -5.643673
## 16650005 -5.704100 -5.972714 -5.618766
## 16650007 -4.899123 -5.896840 -5.234395
## 16650009 -5.689025 -5.509126 -5.593997
## 53612 more rows ...
##
## $F
## [1] 0.49833660 0.34161826 0.03469048 0.94708547 0.50673452
## 53612 more elements ...
##
## $F.p.value
## [1] 0.6080015 0.7108730 0.9659079 0.3889322 0.6029325
## 53612 more elements ...
I will view now the results of the ANOVA in the slots of the data.fit.eb object. The statistic that is calculated in ANOVA is the F-statistic, I may retrieve the F-statistic and its corresponding p-value for each gene in the F and F.p.value slots.
data.fit.eb$F[1:7]
## [1] 0.49833660 0.34161826 0.03469048 0.94708547 0.50673452 0.43180536 0.66442777
data.fit.eb$F.p.value[1:7]
## [1] 0.6080015 0.7108730 0.9659079 0.3889322 0.6029325 0.6497058 0.5152619
ANOVA is always followed by a series of pairwise comparisons. The t-statistics and the resulting p-values of the pairwise comparisons are stored in the t and p.value slots.
head(data.fit.eb$t)
## Contrasts
## a b c
## 16650001 -0.05602414 -0.99268092 0.56641045
## 16650003 0.49837593 0.73142946 -0.01013413
## 16650005 0.25716124 -0.01461087 0.23816666
## 16650007 1.36395416 0.40226227 0.96534520
## 16650009 -0.31564729 -0.99444114 0.33621480
## 16650011 0.92157468 0.03116448 0.80164735
head(data.fit.eb$p.value)
## Contrasts
## a b c
## 16650001 0.9553568 0.3216002 0.5715037
## 16650003 0.6185545 0.4650412 0.9919205
## 16650005 0.7972162 0.9883515 0.8119012
## 16650007 0.1735211 0.6877541 0.3350860
## 16650009 0.7524718 0.3207442 0.7369248
## 16650011 0.3574306 0.9751574 0.4233397
data.fit.eb$lods[1:7,]
## Contrasts
## a b c
## 16650001 -5.732450 -5.510764 -5.502653
## 16650003 -5.622114 -5.721780 -5.643673
## 16650005 -5.704100 -5.972714 -5.618766
## 16650007 -4.899123 -5.896840 -5.234395
## 16650009 -5.689025 -5.509126 -5.593997
## 16650011 -5.352138 -5.972358 -5.361306
## 16650013 -5.705649 -5.352044 -5.537476
The log fold changes can be found in the coefficients slot. This is what we are interested in.
data.fit.eb$coefficients[1:30,]
## Contrasts
## a b c
## 16650001 -0.006471273 -0.079911687 0.073440415
## 16650003 0.063567350 0.065018301 -0.001450951
## 16650005 0.035850796 -0.001419566 0.037270362
## 16650007 0.178746843 0.036739566 0.142007277
## 16650009 -0.019497922 -0.042810615 0.023312693
## 16650011 0.119715868 0.002821417 0.116894451
## 16650013 -0.023355122 -0.074819945 0.051464823
## 16650015 -0.240824828 -0.102760445 -0.138064383
## 16650017 -0.184903489 -0.033737575 -0.151165914
## 16650019 -0.158034338 -0.118865243 -0.039169094
## 16650021 0.010459997 0.070797169 -0.060337172
## 16650023 -0.157196517 -0.096101447 -0.061095070
## 16650025 -0.048758115 -0.016857621 -0.031900493
## 16650027 -0.017225681 -0.133635046 0.116409365
## 16650029 -0.177940735 -0.032063047 -0.145877688
## 16650031 -0.067085091 -0.070443983 0.003358892
## 16650033 -0.006887861 -0.075505054 0.068617193
## 16650035 -0.083017341 -0.057965620 -0.025051721
## 16650037 0.039734685 -0.046162270 0.085896955
## 16650041 -0.138101101 0.071017616 -0.209118718
## 16650043 -0.145276036 0.001155695 -0.146431730
## 16650045 -0.251177989 0.043844131 -0.295022121
## 16650047 -0.177722941 -0.058525331 -0.119197610
## 16650049 -0.045025383 -0.010555072 -0.034470310
## 16650051 -0.193146586 0.028123458 -0.221270044
## 16650053 -0.036932911 -0.019400025 -0.017532886
## 16650055 -0.108543477 -0.019790131 -0.088753346
## 16650057 0.010844757 0.033345551 -0.022500794
## 16650059 -0.098823859 0.031173852 -0.129997711
## 16650061 0.258377464 0.009429214 0.248948250
The best way to decide on the number of DE genes I am going to select is via a Volcano plot. A volcano plot is a graph that allows to simultaneously assess the P values (statistical significance) and log ratios (biological difference) of differential expression for the given genes.
volcanoplot(data.fit.eb, coef = 1, highlight = 10,xlim=c(-2,2),ylim=c(0,7),main="Volcano Plot of sPTD v/s control")
volcanoplot(data.fit.eb, coef = 2, highlight = 10,xlim=c(-2,2),ylim=c(0,7),main="Volcano Plot of PPROM v/s control")
volcanoplot(data.fit.eb, coef = 3, highlight = 10,xlim=c(-2,2),ylim=c(0,7),main="Volcano Plot of sPTD v/s PPROM")
Volcano plots arrange genes along biological and statistical significance.
Finally, I will adjust for multiple testing and defining DE genes. I am doing a t-test on each gene, meaning that I will be doing more than 20000 t-tests on the data set.Since I have 3 groups for the class variable, the decideTests() method will perform multiple testing adjustment on these p-values. Additionally, it will evaluate for each gene whether the results data.fit.eb fulfill the criteria for differential expression that I specify. The adjust.method argument specifies which method is used to adjust the p-values for multiple testing.The value BH means that Benjamini-Hochberg correction will be used. The p.value argument specifies the FDR and the lfc argument specifies the minimal fold change that is required to be considered DE.
The method argument specifies how the p-values are adjusted: global means that all contrasts are considered independent.
DEresults <- decideTests(data.fit.eb,method='global',adjust.method="BH",p.value=0.05,lfc=0.5)
#method: "global" means all contrasts are considered independent. The method will treat the entire matrix of t-statistics as a single vector of independent tests. It is the simplest and obvious choice if you want to do multiple testing in both directions simultaneously. The p-value cutoff will be consistent across all contrasts.
#adjust.method: "BH" means Benjamini-Hochberg correction or "BY" or "holm".
DEresults <- as.data.frame(DEresults)
colnames(DEresults) <- c("sPTD-control","PPROM-control","sPTD-PPROM")
ups_sPTDversusControl <- DEresults[DEresults$`sPTD-control`==1, ] #up-regulated genes for sPTD v/s control
downs_sPTDversusControl <- DEresults[DEresults$`sPTD-control`==-1, ] #down-regulated genes for sPTD v/s control
ups_PPROMversusControl <- DEresults[DEresults$`PPROM-control`==1, ] #up-regulated genes for PPROM v/s control
downs_PPROMversusControl <- DEresults[DEresults$`PPROM-control`==-1, ] #down-regulated genes for PPROM v/s control
ups_sPTDversusPPROM <- DEresults[DEresults$`sPTD-PPROM`==1, ] #up-regulated genes for sPTD v/s PPROM
downs_sPTDversusPPROM <- DEresults[DEresults$`sPTD-PPROM`==-1, ] #down-regulated genes for sPTD v/s PPROM
| sPTD v/s control | PPROM v/s control | sPTD v/s PPROM | |
|---|---|---|---|
| upregulated genes | 3 | 3 | 1 |
| downregulated genes | 1 | 0 | 2 |
Finally, I’ll get the annotations of the probes ids.
Having the gene names, I can finally do GO enrichment and pathway enrichment using some tools and/or databases.
But before that, I’ll going to see if they are any housekeeping genes.
## The number of Housekeeping gene equals to 0
I’ll now move to the analysis of the second type of array, the Affymetrix HTA 2.0 Array.
The list.files() command should be used to obtain the list of CEL files in the folder that was specified by the celpath. Then I will import all the CEL files by a single command using the read.celfiles() method.
celpath <- "~/Desktop/oliver/HTA/"
#import CEL files containing raw probe-level data into an R object
list <- list.files(celpath,full.names=TRUE)
data <- read.celfiles(list)
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_10.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_11.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_13.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_14.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_19.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_20.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_25.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_26.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_28.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_29.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_4.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_5.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_006_P1A06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_018_P1B06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_030_P1C06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_041_P1D05.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_053_P1E05.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_065_P1F05.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_106_P2B02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_107_P2C02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_128_P2H04.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_133_P2E05.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_139_P2C06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_145_P2A07.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_153_P2A08.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_166_P2F09.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_168_P2H09.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_190_P2F12.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_200_P3H01.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_203_P3C02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_205_P3E02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_206_P3F02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_208_P3H02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_211_P3C03.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_231_P3G05.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_233_P3A06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_236_P3D06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_241_P3A07.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_250_P3B08.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_252_P3D08.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_255_P3G08.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_256_P3H08.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_258_P3B09.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_259_P3C09.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_261_P3E09.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_266_P3B10.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_267_P3C10.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_269_P3E10.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_271_P3G10.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_314_P4B04.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_334_P4F06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_337_P4A07.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_359_P4G09.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_381_P4E12.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_383_P4G12.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_412_P5D04.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_440_P5H07.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_491_P6C02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_492_P6D02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_499_P6C03.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_506_P6B04.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_542_P6F08.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_557_P6E10.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_566_P6F11.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_584_P7H01_2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_588_P7D02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_591_P7G02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_593_P7A03.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_597_P7E03.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_600_P7H03.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_611_P7C05.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_612_P7D05.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_617_P7A06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_618_P7B06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_622_P7F06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_623_P7G06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_640_P7H08.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_641_P7A09.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_648_P7H09_2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_650_P7B10.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_666_P7B12.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_667_P7C12.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_672_P7H12.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_702_P8F04.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_717_P8E06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_726_P8F07.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_739_P8C09.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_742_P8F09.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_745_P08A10.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_751_P8G10.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_752_P8H10.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_754_P8B11.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_764_P8D12.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_765_P8E12.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_770_P9B01.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_774_P9F01.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_775_P9G01.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_777_P9A02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_782_P9F02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_786_P9B03.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_811_P9C06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_813_P9E06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_820_P9D07.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_827_P9C08.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_830_P9F08.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_834_P9B09.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_849_P8A11.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_879_P10G02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_910_P10F06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_911_P10G06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_916_P10D07.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_917_P10E07.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_918_P10F07.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_919_P10G07.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_920_P10H07.CEL
The data is now a specific FeatureSet object containing the data from my CEL files.
## The number of microarray probes is equal to 6892960 and the number of microarray samples is equal to 115
## The type of the raw_data is an HTAFeatureSet
How to retrieve intensities of specific rows in the CEL files? There are two methods exprs() and intensity() that can obtain intensity data. Both methods return the same result: a matrix with intensities of all probes. So I am going to used one of them.
int <- oligo::intensity(data)
int[1:10,1:10]
## Sample_10.CEL Sample_11.CEL Sample_13.CEL Sample_14.CEL Sample_19.CEL
## 1 4630 5624 4803 1111 4171
## 2 130 170 179 56 113
## 3 4305 5204 4394 1020 3941
## 4 100 100 110 49 77
## 5 66 79 51 35 66
## 6 43 42 35 33 33
## 7 72 109 57 47 99
## 8 212 310 134 53 232
## 9 49 50 30 33 47
## 10 65 87 38 35 66
## Sample_20.CEL Sample_25.CEL Sample_26.CEL Sample_28.CEL Sample_29.CEL
## 1 4984 5277 5401 5174 4669
## 2 161 153 167 194 151
## 3 4906 4877 5191 4729 4165
## 4 111 143 145 95 87
## 5 51 60 67 74 50
## 6 34 47 44 43 39
## 7 128 133 161 86 104
## 8 360 307 289 305 266
## 9 57 50 45 58 57
## 10 66 67 35 62 70
How to retrieve intensities of PM probes of specific rows in the CEL files? Since I am only working with PM probes, I might want to look at them using the pm() method.
pm <- oligo::pm(data)
pm[1:10,1:10]
## Sample_10.CEL Sample_11.CEL Sample_13.CEL Sample_14.CEL Sample_19.CEL
## 6 43 42 35 33 33
## 7 72 109 57 47 99
## 8 212 310 134 53 232
## 9 49 50 30 33 47
## 10 65 87 38 35 66
## 11 48 46 34 34 46
## 12 110 173 115 54 130
## 13 280 342 125 46 224
## 14 57 103 59 26 85
## 15 133 135 57 36 85
## Sample_20.CEL Sample_25.CEL Sample_26.CEL Sample_28.CEL Sample_29.CEL
## 6 34 47 44 43 39
## 7 128 133 161 86 104
## 8 360 307 289 305 266
## 9 57 50 45 58 57
## 10 66 67 35 62 70
## 11 66 35 48 68 79
## 12 210 204 156 194 208
## 13 443 355 277 470 352
## 14 121 79 98 89 102
## 15 168 142 111 125 119
Apart from the expression data itself, microarray data sets need to include information about the samples that were hybridized to the arrays. One of them is called phenoData. It contains labels for the samples. However, for most data sets the phenoData has not been defined. How to retrieve the sample annotation of the data?
ph <- data@phenoData; ph
## An object of class 'AnnotatedDataFrame'
## rowNames: Sample_10.CEL Sample_11.CEL ... Tarca_920_P10H07.CEL (115
## total)
## varLabels: index
## varMetadata: labelDescription channel
I’ll finally retrieve the first few and last few IDs of the probe sets that are represented on the arrays.
head(featureNames(data))
## [1] "1" "2" "3" "4" "5" "6"
tail(featureNames(data))
## [1] "6892955" "6892956" "6892957" "6892958" "6892959" "6892960"
NA_values <- which(is.na(Biobase::exprs(data)), arr.ind=T)
NaN_values <- which(apply(Biobase::exprs(data), 2, function(x) all(is.nan(x))))
infinite_values <- which(apply(Biobase::exprs(data), 2, function(x) all(is.infinite(x))))
blank_values <- function (x) {sum(x=="") }
bvalues <- apply(Biobase::exprs(data), 2,blank_values); bvalues<-as.character(bvalues);count<-0
for(index in 1:length(bvalues)){
if(bvalues[index]!=0){
count=count+1 } }
| Count | |
|---|---|
| NA values | 0 |
| NaN values | 0 |
| Infinite values | 0 |
| Blank values | 0 |
Since the phenoData object, that was created in the step where I retrieved the sample annotation, does not contain any information, Bioconductor will just give the CEL-files an index 1-115. However, the phenoData will be used as labels in plots. I am going to give the samples more accurate names so they can be used in the plots that I am going to create.
ph@data[,1] <- c("Control1","Control2","Control3","Control4","Control5","Control6","Control7","Control8","Control9","Control10","Control11","Control12","Control13","Control14","Control15","Control16","Control17","Control18","Control19","Control20","Control21","Control22","Control23","Control24","Control25","Control26","Control27","Control28","Control29","Control30","Control31","Control32","Control33","Control34","Control35","Control36","Control37","Control38","Control39","Control40","Control41","Control42","Control43","Control44","Control45","Control46","Control47","Control48","Control49","Control50","Control51","Control52","Control53","Control54","Control55","Control56","Control57","Control58","Control59","PPROM1","PPROM2","PPROM3","PPROM4","PPROM5","PPROM6","PPROM7","PPROM8","PPROM9","PPROM10","PPROM11","PPROM12","PPROM13","PPROM14","PPROM15","PPROM16","PPROM17","PPROM18","PPROM19","PPROM20","PPROM21","PPROM22","PPROM23","PPROM24","PPROM25","PPROM26","PPROM27","PPROM28","PPROM29","sPTD1","sPTD2","sPTD3","sPTD4","sPTD5","sPTD6","sPTD7","sPTD8","sPTD9","sPTD10","sPTD11","sPTD12","sPTD13","sPTD14","sPTD15","sPTD16","sPTD17","sPTD18","sPTD19","sPTD20","sPTD21","sPTD22","sPTD23","sPTD24","sPTD25","sPTD26","sPTD27"); ph
## An object of class 'AnnotatedDataFrame'
## rowNames: Sample_10.CEL Sample_11.CEL ... Tarca_920_P10H07.CEL (115
## total)
## varLabels: index
## varMetadata: labelDescription channel
It’s time to create some plots to assess the quality of the data.
The picture of a microarray can show large inconsistencies on an individual array. How to print the raw intensities of a microarray?
image(data[,1], main=ph@data$sample[1])
Another quality control check is to plot boxplot for first few arrays. This latter is a standardized way of displaying the dataset based on a five-number summary: the minimum, the maximum, the sample median, and the first and third quartiles.
oligo::boxplot(data,ylim = c(0,9),target = "core", main = "Boxplot of log2-intensitites for the raw data",las=2,names=c("Control1","Control2","Control3","Control4","Control5","Control6","Control7","Control8","Control9","Control10","Control11","Control12","Control13","Control14","Control15","Control16","Control17","Control18","Control19","Control20","Control21","Control22","Control23","Control24","Control25","Control26","Control27","Control28","Control29","Control30","Control31","Control32","Control33","Control34","Control35","Control36","Control37","Control38","Control39","Control40","Control41","Control42","Control43","Control44","Control45","Control46","Control47","Control48","Control49","Control50","Control51","Control52","Control53","Control54","Control55","Control56","Control57","Control58","Control59","PPROM1","PPROM2","PPROM3","PPROM4","PPROM5","PPROM6","PPROM7","PPROM8","PPROM9","PPROM10","PPROM11","PPROM12","PPROM13","PPROM14","PPROM15","PPROM16","PPROM17","PPROM18","PPROM19","PPROM20","PPROM21","PPROM22","PPROM23","PPROM24","PPROM25","PPROM26","PPROM27","PPROM28","PPROM29","sPTD1","sPTD2","sPTD3","sPTD4","sPTD5","sPTD6","sPTD7","sPTD8","sPTD9","sPTD10","sPTD11","sPTD12","sPTD13","sPTD14","sPTD15","sPTD16","sPTD17","sPTD18","sPTD19","sPTD20","sPTD21","sPTD22","sPTD23","sPTD24","sPTD25","sPTD26","sPTD27"),col=c("red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue"))
When I look at the boxplot, I see that the intensity distributions of the individual arrays are quite different, indicating the need for an appropriate normalization.
A third quality control is the creation of density estimate for few samples.The standard method for normalization is RMA, which is one of the few normalization methods that only uses the PM probes.
data.rma <- oligo::rma(data)
## Background correcting
## Normalizing
## Calculating Expression
data.matrix <- Biobase::exprs(data.rma)
Normalization can be also done using the GCRMA algorithm.
After doing normalization, I will need to re-visualize the normalized data. For that, I’ll plot boxplots for some microarrays.
I will now perform a Principal Component Analysis (PCA) in order to check whether the overall variability of the samples reflects their grouping. But before that, let’s see what is PCA and what does it performs?
PCA is a standard technique for visualizing high dimensional data and for data pre-processing. PCA reduces the dimensionality (the number of variables) of a data set by maintaining as much variance as possible.Illustrated are three-dimensional gene expression data which are mainly located within a two-dimensional subspace. PCA is used to visualize these data by reducing the dimensionality of the data: the three original variables (genes) are reduced to a lower number of two new variables termed principal components (PCs). Such two-dimensional visualization of the samples allow us to draw qualitative conclusions about the separability of experimental conditions (marked by different colors).
Legend:
Left side: I can identify the two-dimensional plane that optimally describes the highest variance of the data.
Right side: This two-dimensional subspace can then be rotated and presented as a two-dimensional component space.
Class:Color
Control:red
PPROM:green
sPTD:blue
I’ll now create a PCA plot using the prcomp() method.
color<-c("red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue")
data.PC <- prcomp(t(data.matrix),scale.=TRUE)
#t: transpose the element
#sacle.: a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. The default is FALSE for consistency with S, but in general scaling is advisable. Alternatively, a vector of length equal the number of columns of x can be supplied. The value is passed to scale.
plot(data.PC$x[1:115],col=color,ylab="PC1")
As an example I will compare spontaneous preterm labor and delivery with intact membranes (sPTD) and preterm premature rupture of the membranes (PPROM) to a set of Control women.
I first need to tell limma which samples are replicates and which samples belong to different groups by providing this information in the phenoData slot of the HTAFeatureSet. To this end, I will add a second column with sample annotation describing the source of each sample. I will then give this new column a name.
ph@data[ ,2] <-c("Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD")
colnames(ph@data)[2] <- "level"
So, the factor that determines the grouping will have 3 levels.
groups <- ph@data$level
f <- factor(groups,levels = c("Control","sPTD","PPROM"))
Then, I need to create a design matrix. ANOVA needs such a matrix to know which samples belong to which group. Since limma performs an ANOVA, it needs such a design matrix. I will create it using the model.matrix() method.
design <- model.matrix(~ 0 + f)
colnames(design) <- c("Control","sPTD","PPROM")
#Fit linear model for each gene given a series of arrays
data.fit <- lmFit(object = data.rma, design = design)
Afterwards, I need to tell limma which groups I want to compare. For this I define a contrast matrix defining the contrasts (comparisons) of interest by using the makeContrasts() method.
contrast.matrix <- makeContrasts(a=sPTD-Control,b=PPROM-Control,c=sPTD-PPROM,levels=design)
data.fit.con <- contrasts.fit(data.fit,contrast.matrix)
data.fit.eb <- eBayes(data.fit.con)
data.fit.eb
## An object of class "MArrayLM"
## $coefficients
## Contrasts
## a b c
## 2824546_st 0.1957635 -0.01687591 0.21263942
## 2824549_st 0.2927558 0.15484979 0.13790597
## 2824551_st 0.2634952 0.18959655 0.07389866
## 2824554_st 0.3362027 0.28500707 0.05119561
## 2827992_st 0.5283583 0.49529253 0.03306574
## 70518 more rows ...
##
## $rank
## [1] 3
##
## $assign
## [1] 1 1 1
##
## $qr
## $qr
## Control sPTD PPROM
## 1 -7.6811457 0.000000 0.000000
## 2 0.1301889 -5.196152 0.000000
## 3 0.1301889 0.000000 -5.385165
## 4 0.1301889 0.000000 0.000000
## 5 0.1301889 0.000000 0.000000
## 110 more rows ...
##
## $qraux
## [1] 1.130189 1.000000 1.000000
##
## $pivot
## [1] 1 2 3
##
## $tol
## [1] 1e-07
##
## $rank
## [1] 3
##
##
## $df.residual
## [1] 112 112 112 112 112
## 70518 more elements ...
##
## $sigma
## 2824546_st 2824549_st 2824551_st 2824554_st 2827992_st
## 0.9062726 0.6663917 0.9200431 1.1218266 0.8486486
## 70518 more elements ...
##
## $cov.coefficients
## Contrasts
## Contrasts a b c
## a 0.05398619 0.01694915 0.03703704
## b 0.01694915 0.05143191 -0.03448276
## c 0.03703704 -0.03448276 0.07151980
##
## $stdev.unscaled
## Contrasts
## a b c
## 2824546_st 0.2323493 0.226786 0.2674319
## 2824549_st 0.2323493 0.226786 0.2674319
## 2824551_st 0.2323493 0.226786 0.2674319
## 2824554_st 0.2323493 0.226786 0.2674319
## 2827992_st 0.2323493 0.226786 0.2674319
## 70518 more rows ...
##
## $pivot
## [1] 1 2 3
##
## $Amean
## 2824546_st 2824549_st 2824551_st 2824554_st 2827992_st
## 9.356964 9.038516 8.745541 8.576868 8.669450
## 70518 more elements ...
##
## $method
## [1] "ls"
##
## $design
## Control sPTD PPROM
## 1 1 0 0
## 2 1 0 0
## 3 1 0 0
## 4 1 0 0
## 5 1 0 0
## 110 more rows ...
##
## $contrasts
## Contrasts
## Levels a b c
## Control -1 -1 0
## sPTD 1 0 1
## PPROM 0 1 -1
##
## $df.prior
## [1] 3.655714
##
## $s2.prior
## [1] 0.02530346
##
## $var.prior
## [1] 1.7649239 1.0304117 0.9079151
##
## $proportion
## [1] 0.01
##
## $s2.post
## 2824546_st 2824549_st 2824551_st 2824554_st 2827992_st
## 0.7961687 0.4308411 0.8205231 1.2195156 0.6982396
## 70518 more elements ...
##
## $t
## Contrasts
## a b c
## 2824546_st 0.9442519 -0.0833966 0.8911034
## 2824549_st 1.9195771 1.0402452 0.7856179
## 2824551_st 1.2519471 0.9229299 0.3050549
## 2824554_st 1.3102862 1.1380088 0.1733508
## 2827992_st 2.7213535 2.6136247 0.1479663
## 70518 more rows ...
##
## $df.total
## [1] 115.6557 115.6557 115.6557 115.6557 115.6557
## 70518 more elements ...
##
## $p.value
## Contrasts
## a b c
## 2824546_st 0.347009380 0.93368036 0.3747238
## 2824549_st 0.057374816 0.30039573 0.4336979
## 2824551_st 0.213115114 0.35796516 0.7608726
## 2824554_st 0.192695327 0.25746876 0.8626787
## 2827992_st 0.007506803 0.01014979 0.8826270
## 70518 more rows ...
##
## $lods
## Contrasts
## a b c
## 2824546_st -5.919161 -6.114861 -5.533759
## 2824549_st -4.579553 -5.600944 -5.615907
## 2824551_st -5.592055 -5.710613 -5.860135
## 2824554_st -5.519962 -5.499742 -5.889574
## 2827992_st -2.844309 -2.934606 -5.893386
## 70518 more rows ...
##
## $F
## [1] 0.5293839 1.9420843 0.9346243 1.1447720 5.3880558
## 70518 more elements ...
##
## $F.p.value
## [1] 0.590387938 0.148052969 0.395679018 0.321876206 0.005790017
## 70518 more elements ...
I will view now the results of the ANOVA in the slots of the data.fit.eb object. The statistic that is calculated in ANOVA is the F-statistic, I may retrieve the F-statistic and its corresponding p-value for each gene in the F and F.p.value slots.
head(data.fit.eb$F)
## [1] 0.5293839 1.9420843 0.9346243 1.1447720 5.3880558 2.4682771
head(data.fit.eb$F.p.value)
## [1] 0.590387938 0.148052969 0.395679018 0.321876206 0.005790017 0.089184085
ANOVA is always followed by a series of pairwise comparisons. The t-statistics and the resulting p-values of the pairwise comparisons are stored in the t and p.value slots.
data.fit.eb$t[1:10,]
## Contrasts
## a b c
## 2824546_st 0.9442519 -0.0833966 0.89110339
## 2824549_st 1.9195771 1.0402452 0.78561791
## 2824551_st 1.2519471 0.9229299 0.30505491
## 2824554_st 1.3102862 1.1380088 0.17335082
## 2827992_st 2.7213535 2.6136247 0.14796633
## 2827995_st 1.7580437 1.8518982 -0.04301835
## 2827996_st 2.6059483 2.9737162 -0.25766239
## 2828010_st 2.3791492 3.4213779 -0.83433278
## 2828012_st -0.2489306 0.5421243 -0.67600429
## 2835442_st 1.9643820 1.4253368 0.49798202
data.fit.eb$p.value[1:10,]
## Contrasts
## a b c
## 2824546_st 0.347009380 0.9336803630 0.3747238
## 2824549_st 0.057374816 0.3003957335 0.4336979
## 2824551_st 0.213115114 0.3579651610 0.7608726
## 2824554_st 0.192695327 0.2574687581 0.8626787
## 2827992_st 0.007506803 0.0101497910 0.8826270
## 2827995_st 0.081385543 0.0665907706 0.9657611
## 2827996_st 0.010366893 0.0035814744 0.7971254
## 2828010_st 0.018989804 0.0008619341 0.4058135
## 2828012_st 0.803855965 0.5887758773 0.5003875
## 2835442_st 0.051884453 0.1567530869 0.6194423
data.fit.eb$lods[1:10,]
## Contrasts
## a b c
## 2824546_st -5.919161 -6.1148607 -5.533759
## 2824549_st -4.579553 -5.6009443 -5.615907
## 2824551_st -5.592055 -5.7106125 -5.860135
## 2824554_st -5.519962 -5.4997425 -5.889574
## 2827992_st -2.844309 -2.9346059 -5.893386
## 2827995_st -4.861740 -4.4959277 -5.902755
## 2827996_st -3.127251 -2.0323862 -5.872593
## 2828010_st -3.651428 -0.7746806 -5.579238
## 2828012_st -6.323437 -5.9772155 -5.690435
## 2835442_st -4.497164 -5.1512097 -5.787821
The log fold changes can be found in the coefficients slot. This is what we are interested in.data.fit.eb$coefficients[1:30,]
## Contrasts
## a b c
## 2824546_st 0.195763508 -0.016875912 0.21263942
## 2824549_st 0.292755760 0.154849791 0.13790597
## 2824551_st 0.263495211 0.189596554 0.07389866
## 2824554_st 0.336202679 0.285007071 0.05119561
## 2827992_st 0.528358275 0.495292534 0.03306574
## 2827995_st 0.394706202 0.405822747 -0.01111654
## 2827996_st 0.639348606 0.712108951 -0.07276034
## 2828010_st 0.629716499 0.883892470 -0.25417597
## 2828012_st -0.073093633 0.155372834 -0.22846647
## 2835442_st 0.491023025 0.347751010 0.14327202
## 2835447_st 0.268143915 -0.026855591 0.29499951
## 2835453_st 0.728764374 -0.742317167 1.47108154
## 2835456_st 0.259359766 0.279606306 -0.02024654
## 2835459_st 0.094126933 0.172027392 -0.07790046
## 2835461_st 0.271307069 0.259616677 0.01169039
## 2839509_st 0.152366626 0.035537794 0.11682883
## 2839511_st 0.185356297 0.115081206 0.07027509
## 2839513_st 0.162886537 0.192477172 -0.02959064
## 2839515_st 0.011226634 -0.019332259 0.03055889
## 2839517_st 0.026322657 0.062023632 -0.03570098
## 2839524_st 0.258321978 0.197476333 0.06084565
## 2839528_st 0.427048180 0.309238227 0.11780995
## 2839532_st 0.258582841 0.290925441 -0.03234260
## 2839538_st 0.006356328 -0.020058561 0.02641489
## 2839539_st -0.213419168 -0.240033563 0.02661440
## 2858288_st 0.092428444 -0.115961172 0.20838962
## 2886354_st 0.082782659 0.009741304 0.07304136
## 2886356_st 0.204621528 0.068518174 0.13610335
## 2886364_st 0.354728838 0.148670865 0.20605797
## 2886370_st 0.247361073 -0.005191360 0.25255243
The best way to decide on the number of DGE I am going to select is via a volcano plot. I want to find genes that are DE between asymptomatic women and the highlight parameter allows to specify the number of highest scoring genes for which names will be attached on the plot.
volcanoplot(data.fit.eb, coef = 1, highlight = 10,xlim=c(-2,2),ylim=c(0,7),main="Volcano Plot of sPTD v/s control")
volcanoplot(data.fit.eb, coef = 2, highlight = 10,xlim=c(-2,2),ylim=c(0,7),main="Volcano Plot of PPROM v/s control")
volcanoplot(data.fit.eb, coef = 3, highlight = 10,xlim=c(-2,2),ylim=c(0,7),main="Volcano Plot of sPTD v/s PPROM")
Volcano plots arrange genes along biological and statistical significance. The X-axis gives the log fold change between the two groups, and the Y-axis represents the p-value of a t-test comparing samples. Hence, the first axis indicates biological impact of the change; the second indicates the statistical evidence of the change.
Finally, I am doing a t-test on each gene, meaning that I will be doing more than 20000 t-tests on the data set. I have to adjust the p-values of the t-tests for multiple testing. Of course my final aim is to generate the DE genes (the genes with the lowest adjusted p-values and the most extreme log fold changes). I will then use the IDs in order to search for functional relations between the genes.Since I have 3 groups for the class variable, the decideTests() method will perform multiple testing adjustment on these p-values. Additionally, it will evaluate for each gene whether the results data.fit.eb fulfill the criteria for differential expression that I specify. The adjust.method argument specifies which method is used to adjust the p-values for multiple testing.The value BH means that Benjamini-Hochberg correction will be used. The p.value argument specifies the FDR and the lfc argument specifies the minimal fold change that is required to be considered DE.
DEresults <- decideTests(data.fit.eb,method='global',adjust.method="BH",p.value=0.05,lfc=0.7)
#adjust.method: character string specifying p-value adjustment method. Possible values are "none", "BH", "fdr" (equivalent to "BH"), "BY" and "holm".
DEresults <- as.data.frame(DEresults)
colnames(DEresults) <- c("sPTD-control","PPROM-control","sPTD-PPROM")
ups_sPTDversusControl <- DEresults[DEresults$`sPTD-control`==1, ] #up-regulated genes for sPTD v/s control
downs_sPTDversusControl <- DEresults[DEresults$`sPTD-control`==-1, ] #down-regulated genes for sPTD v/s control
ups_PPROMversusControl <- DEresults[DEresults$`PPROM-control`==1, ] #up-regulated genes for PPROM v/s control
downs_PPROMversusControl <- DEresults[DEresults$`PPROM-control`==-1, ] #down-regulated genes for PPROM v/s control
ups_sPTDversusPPROM <- DEresults[DEresults$`sPTD-PPROM`==1, ] #up-regulated genes for sPTD v/s PPROM
downs_sPTDversusPPROM <- DEresults[DEresults$`sPTD-PPROM`==-1, ] #down-regulated genes for sPTD v/s PPROM
| sPTD v/s control | PPROM v/s control | sPTD v/s PPROM | |
|---|---|---|---|
| upregulated genes | 33 | 16 | 7 |
| downregulated genes | 41 | 14 | 14 |
Finally, I’ll get the annotations of the probes ids.
GPL17586.45144.4 <- read.delim("~/Downloads/GPL17586-45144-4.txt", comment.char="#")
index1 <- c()
v1 <- (rownames(ups_sPTDversusControl))
for(a in v1){ index1 <- c(index1, rownames(GPL17586.45144.4[ GPL17586.45144.4$ID==a, ])) }
write.table(GPL17586.45144.4 [ index1 , c(1,2,3,4,5,6,8) ], file = "~/Desktop/microarray_analysis/Human Transcriptome Array 2.0 /upregulated_sPTDversusControl.xls",col.names=NA,sep="\t",quote=F)
index2 <- c()
v2 <- (rownames(downs_sPTDversusControl))
for(b in v2){ index2 <- c(index2, rownames(GPL17586.45144.4[ GPL17586.45144.4$ID==b, ])) }
write.table(GPL17586.45144.4 [ index2 , c(1,2,3,4,5,6,8) ], file = "~/Desktop/microarray_analysis/Human Transcriptome Array 2.0 /downregulated_sPTDversusControl.xls",col.names=NA,sep="\t",quote=F)
index3 <- c()
v3 <- (rownames(ups_PPROMversusControl))
for(c in v3){ index3 <- c(index3, rownames(GPL17586.45144.4[ GPL17586.45144.4$ID==c, ])) }
write.table(GPL17586.45144.4 [ index3 , c(1,2,3,4,5,6,8) ], file = "~/Desktop/microarray_analysis/Human Transcriptome Array 2.0 /upregulated_PPROMversusControl.xls",col.names=NA,sep="\t",quote=F)
index4 <- c()
v4 <- (rownames(downs_PPROMversusControl))
for(d in v4){ index4 <- c(index4, rownames(GPL17586.45144.4[ GPL17586.45144.4$ID==d, ])) }
write.table(GPL17586.45144.4 [ index4 , c(1,2,3,4,5,6,8) ], file = "~/Desktop/microarray_analysis/Human Transcriptome Array 2.0 /upregulated_sPTDversusPPROM.xls",col.names=NA,sep="\t",quote=F)
index5 <- c()
v5 <- (rownames(downs_sPTDversusPPROM))
for(e in v5){ index5 <- c(index5, rownames(GPL17586.45144.4[ GPL17586.45144.4$ID==e, ])) }
write.table(GPL17586.45144.4 [ index5 , c(1,2,3,4,5,6,8) ], file = "~/Desktop/microarray_analysis/Human Transcriptome Array 2.0 /downregulated_sPTDversusPPROM.xls",col.names=NA,sep="\t",quote=F)
Having the gene names, I can finally do some enrichment analysis such as network analysis and pathways.
But before that, I’ll going to see if they are any housekeeping genes.
## The number of Housekeeping gene equals to 2809
This brings us to the end of the workflow for differential gene expression using Affymetrix microarrays.